Debugging ARM without a Debugger 2: Abort Handlers
This is my second post in the series Debugging ARM without a Debugger.
This is an excerpt from my debugging techniques document for Real-time Programming. These techniques are written in the context of writing a QNX-like real-time microkernel and a model train controller on a ARMv4 (ARM920T, Technologic TS-7200). The source code is located here. My teammate (Pavel Bakhilau) and I are the authors of the code.
It is useful to have a simple abort handler early on before working on anything complex, like context switch. The default abort handlers that come with the bootloader spew out minimal information for gdb if lucky, or often they just hang with no message (In fact, I am now very grateful that I am able to kernel panic messages at all when things are gravely wrong with my computer). By installing an abort handler, you will be able to what went wrong in case the asserts were not good enough to catch problems earlier.
Installation
There are three interrupt vectors that need to be intercepted: undefined instruction (0x4), prefetch abort (0xc) and data abort (0x10). We can re-use one abort handler because the abort type can be read from the cpsr. One exception is that both instruction fetch abort and data fetch abort share the same processor mode. We can work around this by passing a flag to the C abort handler. The following is a sample code:
// c prototype of the abort handler
void handle_abort(int fp, int dataabort);
// the abort handler in assembly that calls the C handler
.global asm_handle_dabort
asm_handle_dabort:
mov r1, #1
b abort
.global asm_handle_abort
asm_handle_abort:
mov r1, #0
abort:
ldr sp, =0x2000000
mov r0, fp
bl handle_abort
dead:
b dead
Because ARM has a separate set of banked registers for abort modes, the stack pointer is uninitialized. Since I wanted to use a C handler to print out messages, I need to set up a stack. In this code, I manually set the stack pointer to be the end of the physical memory (our board had 32MB RAM in total so 0x2000000 is the end of the memory). For convenience, I also pass the current frame pointer in case I want to examine the stack of the abort-causing code.
When dealing with register values directly in C, it is convenient to have the following macro to read register values:
#define READ_REGISTER(var) \
__asm volatile("mov %[" #var "], " #var "\n\t" : [var] "=r" (var))
// usage: int lr; READ_REGISTER(lr);
#define READ_CPSR(var) \
__asm volatile("mrs %[mode], cpsr" "\n\t" "and %[mode], %[mode], #0x1f" "\n\t" \
: [mode] "=r" (var))
// usage: int cpsr; READ_CPSR(cpsr);
In the C abort handler, by reading the cpsr, you should be able to figure out the current mode. Refer to ARM Reference Manual section A2.2.
The following a brief summary of the abort environment and their interpretation. The precise information can be found in the reference manual chapter A2. You should read the manual to understand the process better.
An important thing to remember is that you should do your best to ensure that your abort handler does not cause another abort inside. Again, be very conservative when dereferencing pointers.
Interpretation
Read all the values from the registers first, and then print. Otherwise, there is a chance some registers might get overwritten.
cpsr
dabort refers to the second parameter passed into the C abort handler.
The lower 5 bits of cpsr |
Interpretation |
0x13 |
You are in svc mode. It probably means your abort handler caused another abort inside. Fix it. |
0x17 (dataabort = 0) |
Instruction fetch abort |
0x17 (dataabort = 1) |
Data fetch abort |
0x1B |
Undefined instruction |
lr
Link Register normally contains the address to one instruction after the instruction that called the current function.
Current mode |
Interpretation |
Data fetch abort |
The abort was caused by the instruction at lr - 8 |
Instruction fetch abort |
The abort was caused by the instruction at lr - 4 |
Undefined instruction |
The abort was caused by the instruction at lr |
Fault type (in case of data/instr. fetch abort)
Read the fault type using the following code:
volatile unsigned int faulttype;
__asm volatile ("mrc p15, 0, %[ft], c5, c0, 0\n\t" : [ft] "=r" (faulttype));
faulttype &= 0xf;
Fault type value |
Interpretation |
(faulttype >> 0x2) == 0 |
misaligned memory access |
0x5 |
translation |
0x8 |
external abort on noncacheable |
0x9 |
domain |
0xD |
permission |
To see a big picture of how the fault checking works (other than misaligned memory access), you are advised to read the section 3.7 of ARM920T Technical Reference Manual. In short, unless you are making use of memory protection, you will never get domain and permission faults.
Data fault address (only applicable to a data abort)
This is the address the code tried to access, which caused the data fetch abort. Read it using the following code:
volatile unsigned int datafaultaddr;
__asm volatile ("mrc p15, 0, %[dfa], c6, c0, 0\n\t" : [dfa] "=r" (datafaultaddr));
Our actual abort handling code is located here.
Summary
It is very convenient to have a bullet-proof abort handler. It really gives you a lot more information about the problem than a hang. As well, don’t forget that most DRAM content is not erased after a hard reset, so you can use RedBoot’s dump (x) command to examine the memory, if really needed. With some effort, one can also set up the MMU to implement a very simple write-protection of the code region. Such protection could be useful to prevent the most insidious kind of bugs from occurring (Luckily, we did not have to deal with such bugs).
Debugging ARM without a Debugger 1: Use of Asserts
This is my first post in the series Debugging ARM without a Debugger.
This is an excerpt from my debugging techniques document for Real-time Programming. These techniques are written in the context of writing a QNX-like real-time microkernel and a model train controller on a ARMv4 (ARM920T, Technologic TS-7200). The source code is located here. My teammate (Pavel Bakhilau) and I are the authors of the code.
Failing fast is an extremely useful property when programming in C. For example, problems with pointers are much easier to debug if you know exactly when an invalid pointer value is passed into a function. Here are few tips for asserting effectively:
There is no such thing as putting too much asserts.
CPU power used for asserts will almost never cause a critical performance issue [in this course]. You can disable them when you know your code is perfect. Verify pointers every pointer dereference.
Assert pointers more aggressively.
Do not just check for NULLs. We know more about the pointer addresses. We know that the pointer address is limited by the size of the memory. As well, from the linker script, we can even deduce more information. For example, we know that normally, we would not want to dereference anything below the address 0x218000 because that is where the kernel is loaded. Similarly, we can figure out what memory region is text and data.
Remove all uncertainties.
Turn off interrupts as soon as possible in the assert macro. When things go wrong, you want to stop the program execution (and the trains) right away. If you do not turn off interrupts, a context switch might occur to other task and you might not be able to come back ever to stop and display what went wrong.
Print as much information as possible.
Make an assert macro that resembles printf and print as much contextual information as possible. When you have no debugger, rebooting and reproducing can be really time-consuming. 1.5 months is a very short time to build an operating system from scratch so use it wisely.
e.g. ASSERT(condition, “oops! var1:%d, var2:%x, var3:%s”, var1, var2, var3);
Example
Here’s a short snippet of ASSERT macro. It has evolved over 3 months and it looks really dirty but it works. (source)
typedef uint volatile * volatile vmemptr;
#define VMEM(x) (*(vmemptr)(x))
void bwprintf(int channel, char *fmt, ...);
#define READ_REGISTER(var) __asm volatile("mov %[" TOSTRING(var) "], " TOSTRING(var) "\n\t" : [var] "=r" (var))
#define READ_CPSR(var) __asm("mrs %[mode], cpsr" "\n\t" "and %[mode], %[mode], #0x1f" "\n\t" : [mode] "=r" (var))
void print_stack_trace(uint fp, int clearscreen);
void td_print_crash_dump();
int MyTid();
#if ASSERT_ENABLED
#define ASSERT(X, ...) { \
if (!(X)) { \
VMEM(VIC1 + INTENCLR_OFFSET) = ~0; /* turn off the vectored interrupt controllers */ \
VMEM(VIC2 + INTENCLR_OFFSET) = ~0; \
int cpsr; READ_CPSR(cpsr); \
int inusermode = ((cpsr & 0x1f) == 0x10); int tid = inusermode ? MyTid() : -1; \
bwprintf(0, "%c", 0x61); /* emergency shutdown of the train */ \
int fp, lr, pc; READ_REGISTER(fp); READ_REGISTER(lr); READ_REGISTER(pc); \
bwprintf(1, "\x1B[1;1H" "\x1B[1K"); \
bwprintf(1, "assertion failed in file " __FILE__ " line:" TOSTRING(__LINE__) " lr: %x pc: %x, tid: %d" CRLF, lr, pc, tid); \
bwprintf(1, "[%s] ", __func__); \
bwprintf(1, __VA_ARGS__); \
bwprintf(1, "\n"); /* if in usermode ask kernel for crashdump, otherwise print it directly */ \
if (inusermode) { __asm("swi 12\n\t");} else { td_print_crash_dump(); } \
bwprintf(1, "\x1B[1K"); \
print_stack_trace(fp, 0); \
die(); \
} \
}
#else
#define ASSERT(X, ...)
#endif
That’s it for today.
Working wpa_supplicant.conf configuration for the network uw-secure at UWaterloo for Xperia X10 (1.6)
While Sony Ericsson has promised us that they will update X10 with a moderately recent version (2.1) of the Android Operating System by Q4 2010, those of us who are stuck with Android 1.6 cannot normally connect to the most wireless networks using WPA-EAP including uw-wireless at the University of Waterloo. Apparently, the reason is while Android 1.6 does support WPA-EAP, there is no user interface (!) for editing these network configurations.
Fortunately, X10 (including X10a sold in Canada by Rogers) has been rooted very recently by the people at xda-developers.com. You can follow the guidelines here (For X10a users, it is important to install stuff in the post #5 as well).
After obtaining the root of the phone, you can edit the file wpa_supplicant.conf in /data/misc/wifi directory. I made a copy before making changes just in case. It is important that the owner and the permission of the file remains the exact same (owner: system, group: wifi and permission: 660).
Using your favourite method, append the following to the file:
network={
ssid="uw-secure"
scan_ssid=1
proto=WPA
key_mgmt=WPA-EAP
eap=PEAP
identity="UWDirID"
password="UWDirPASSWORD"
phase1="peaplabel=0"
phase2="auth=MSCHAPV2"
}
I’ve assembled the configuration from this post at Arch Linux Forum by vogt. Two modifications I made is that I removed the line specifying ca_cert and added the line proto=WPA. For whatever reason, the phone will ignore the configuration if there is no proto=WPA line.
About Me
My name is Woongbin Kang.
I graduated from the University of Waterloo with Bachelor of Computer Science with Software Engineering Option and Cognitive Science Option (Here is the list of courses that I finished at the University of Waterloo during my undergrad career).
The views expressed on this site are my own and do not reflect those of my employer or its clients.
Contact
LinkedIn: profileTwitter: @selasonic