Debugging ARM without a Debugger 2: Abort Handlers
This is my second post in the series Debugging ARM without a Debugger.
This is an excerpt from my debugging techniques document for Real-time Programming. These techniques are written in the context of writing a QNX-like real-time microkernel and a model train controller on a ARMv4 (ARM920T, Technologic TS-7200). The source code is located here. My teammate (Pavel Bakhilau) and I are the authors of the code.
It is useful to have a simple abort handler early on before working on anything complex, like context switch. The default abort handlers that come with the bootloader spew out minimal information for gdb if lucky, or often they just hang with no message (In fact, I am now very grateful that I am able to kernel panic messages at all when things are gravely wrong with my computer). By installing an abort handler, you will be able to what went wrong in case the asserts were not good enough to catch problems earlier.
Installation
There are three interrupt vectors that need to be intercepted: undefined instruction (0x4), prefetch abort (0xc) and data abort (0x10). We can re-use one abort handler because the abort type can be read from the cpsr. One exception is that both instruction fetch abort and data fetch abort share the same processor mode. We can work around this by passing a flag to the C abort handler. The following is a sample code:
// c prototype of the abort handler
void handle_abort(int fp, int dataabort);
// the abort handler in assembly that calls the C handler
.global asm_handle_dabort
asm_handle_dabort:
mov r1, #1
b abort
.global asm_handle_abort
asm_handle_abort:
mov r1, #0
abort:
ldr sp, =0x2000000
mov r0, fp
bl handle_abort
dead:
b dead
Because ARM has a separate set of banked registers for abort modes, the stack pointer is uninitialized. Since I wanted to use a C handler to print out messages, I need to set up a stack. In this code, I manually set the stack pointer to be the end of the physical memory (our board had 32MB RAM in total so 0x2000000 is the end of the memory). For convenience, I also pass the current frame pointer in case I want to examine the stack of the abort-causing code.
When dealing with register values directly in C, it is convenient to have the following macro to read register values:
#define READ_REGISTER(var) \
__asm volatile("mov %[" #var "], " #var "\n\t" : [var] "=r" (var))
// usage: int lr; READ_REGISTER(lr);
#define READ_CPSR(var) \
__asm volatile("mrs %[mode], cpsr" "\n\t" "and %[mode], %[mode], #0x1f" "\n\t" \
: [mode] "=r" (var))
// usage: int cpsr; READ_CPSR(cpsr);
In the C abort handler, by reading the cpsr, you should be able to figure out the current mode. Refer to ARM Reference Manual section A2.2.
The following a brief summary of the abort environment and their interpretation. The precise information can be found in the reference manual chapter A2. You should read the manual to understand the process better.
An important thing to remember is that you should do your best to ensure that your abort handler does not cause another abort inside. Again, be very conservative when dereferencing pointers.
Interpretation
Read all the values from the registers first, and then print. Otherwise, there is a chance some registers might get overwritten.
cpsr
dabort refers to the second parameter passed into the C abort handler.
The lower 5 bits of cpsr |
Interpretation |
0x13 |
You are in svc mode. It probably means your abort handler caused another abort inside. Fix it. |
0x17 (dataabort = 0) |
Instruction fetch abort |
0x17 (dataabort = 1) |
Data fetch abort |
0x1B |
Undefined instruction |
lr
Link Register normally contains the address to one instruction after the instruction that called the current function.
Current mode |
Interpretation |
Data fetch abort |
The abort was caused by the instruction at lr - 8 |
Instruction fetch abort |
The abort was caused by the instruction at lr - 4 |
Undefined instruction |
The abort was caused by the instruction at lr |
Fault type (in case of data/instr. fetch abort)
Read the fault type using the following code:
volatile unsigned int faulttype;
__asm volatile ("mrc p15, 0, %[ft], c5, c0, 0\n\t" : [ft] "=r" (faulttype));
faulttype &= 0xf;
Fault type value |
Interpretation |
(faulttype >> 0x2) == 0 |
misaligned memory access |
0x5 |
translation |
0x8 |
external abort on noncacheable |
0x9 |
domain |
0xD |
permission |
To see a big picture of how the fault checking works (other than misaligned memory access), you are advised to read the section 3.7 of ARM920T Technical Reference Manual. In short, unless you are making use of memory protection, you will never get domain and permission faults.
Data fault address (only applicable to a data abort)
This is the address the code tried to access, which caused the data fetch abort. Read it using the following code:
volatile unsigned int datafaultaddr;
__asm volatile ("mrc p15, 0, %[dfa], c6, c0, 0\n\t" : [dfa] "=r" (datafaultaddr));
Our actual abort handling code is located here.
Summary
It is very convenient to have a bullet-proof abort handler. It really gives you a lot more information about the problem than a hang. As well, don’t forget that most DRAM content is not erased after a hard reset, so you can use RedBoot’s dump (x) command to examine the memory, if really needed. With some effort, one can also set up the MMU to implement a very simple write-protection of the code region. Such protection could be useful to prevent the most insidious kind of bugs from occurring (Luckily, we did not have to deal with such bugs).