Debugging ARM without a Debugger 3: Printing Stack Trace

This is the last post in the series Debugging ARM without a Debugger.

This is an excerpt from my debugging techniques document for Real-time Programming. These techniques are written in the context of writing a QNX-like real-time microkernel and a model train controller on an ARMv4 (ARM920T, Technologic TS-7200). The source code is located here. Mby teammate (Pavel Bakhilau) and I are the authors of the code.


A stack trace is the ultimate tool that can help you tell exactly where a problem is occurring when used in conjunction with asserts (e.g. in my code, an assert failure triggers the stack trace dump. I also wired the ESC key to an assert failure).

It is particularly useful when you have a complex applications with deep call stacks. For example, if an assert has failed in a utility function such as stack_push in a complex application, it is practically impossible to figure out what happened where without putting print statements everywhere.

With a stack trace, we can re-construct the run-time call hierarchy and find out what is happening. At the end of this article, I will present an example of sophisticated stack trace output that can help us diagnose complex concurrency issues.

Stack Frame Structure

We can deduce the exact stack frame structure from the assembly code generated by the compiler (GCC-arm in my case). Here is an example of a typical function header:

func:
mov ip, sp
stmfd   sp!, {(other optional registers), sl, fp, ip, lr, pc}
sub fp, ip, #4
; function body continues...

The compiler will save the registers pc, lr, ip, fp, sl into the stack in that order. Additionally, the compiler may save any other scratch register used in the function. Important registers for printing a stack trace are pc, lr and fp.

Note that if any compiler optimization is turned on (e.g. -O flag), you need to pass the extra argument -fno-omit-frame-pointer. Otherwise, GCC will optimize out the code that saves the frame pointer.

pc (program counter)

Reading the saved pc gives us the address of the entry point of the function plus 16 bytes. This is because pc is always 2 instructions ahead in ARM when we save it.

lr (link register)

The lr register is the address to return when the current function returns. An instruction before lr would give us the exact code address of the caller of the current function.

fp (frame pointer)

This is the frame pointer of the previous function. We will need to read this in order to “crawl up” the call graph.

Stack Trace Crawler

Here is the pseudocode (or the actual code) for printing the stack trace:

// a poorly written macro for reading an int at the specified memory address x.
#define VMEM(x) (*(unsigned int volatile * volatile)(x))

lr = 0; depth = 0;
do {
   pc = VMEM(fp) - 16;

   // print here: the calling code is at lr, the current function addr is pc

   if (lr is invalid) break;

   lr = VMEM(fp - 4);
   fp = VMEM(fp - 12);

   if (fp is not a valid memory or depth too big) break;

   depth++;

} while (the current function is not a top-level function && depth is < some threshold);

Here's an example code for reading the frame pointer which is required to start printing the stack trace:

#define STRINGIFY(x) #x
#define TOSTRING(x) STRINGIFY(x)
// reads the register reg to the variable var
#define READ_REGISTER(var,reg) __asm volatile("mov %[result], " TOSTRING(reg) "\n\t" : [result] "=r" (var))

int fp; READ_REGISTER(fp, fp);

The most important thing here is that you want this code not to fail. Here are common things that can happen that you don't want:

  • Abort inside another abort (or, an abort inception; install a good abort handler to find out why)
  • Invalid pointer dereference (e.g. outside the physical memory, or outside .text region)
  • Stack overflow which will lead to another abort (by getting stuck in an infinite loop of crazy corrupt frame pointers)

Finding out the corresponding C code

Use the command objdump -SD executable | less to figure out what the C code is at a given address. Passing the compiler flag -ggdb enables objdump to print the C source code next to the disaseembled code. It may not always work with higher optimization level.

Printing the function name

The debugging process can be much faster if you can see the function names in a stack trace right away when the program crashed, instead of running objdump every time manually.

The proper way to do it is to read the debugging information from the .debug section of the memory. I did not have time to do that, so instead I built my own symbol table array using a shell script hooked up to the makefile.

This symbol table does not need to be sophisticated. A simple array of a function address and its name is good enough. This is because the performance is not a concern when you are printing the stack trace of a crashed system. Another reason is that we want this code to work all the time. It is pretty hard to mess up a linear search.

The symbol array is built off the exported symbols. The method I have used is simple. After compiling all the source code into assembly files, I run a shell script to search for the string “.global” in all the assembly files to generate the exported symbol table. Then I compile the generated code of exported symbols as well, and then link it all together at the end. The following is a sample code how to do it:

funcmap.h (funcmap provides the interface to find function names given an address)

typedef struct _tag_funcinfo {
   unsigned int fn;
   char *name;
} funcinfo;

/* call this before calling find_function_name */
void __init_funclist();
funcinfo *__getfunclist();
/* call this function to find the name of the function */
static inline char* find_function_name(uint pc) {
 funcinfo* fl = __getfunclist();
 int i = 0;

 while (fl[i].fn != 0) {
    if (fl[i].fn == pc) return fl[i].name;
    i++;
 }

 return "[unknown function]";
}
funcmap.c (generated by a shell script)
#include <./task.h> // include ALL the header files

static funcinfo __funclist[1]; // the length of this array is also generated

void __init_funclist() {
   int i = 0;
   __funclist[i].fn=(unsigned int)some_func;
   __funclist[i++].name="some_func";
   // .. more
   __funclist[i].fn=0; // null terminated

}
funcinfo* __getfunclist() { return __funclist; }

Lastly, this is how I read all the function names from assembly files in the shell script (the actual script):

FUNCTION_COUNT=`find . -name '*.S' -o -name '*.s' | xargs grep .global | awk '{print $3}' | grep -v '^$' | grep -v '^__' | sort | uniq | wc -l`
FUNCTIONS=`find . -name '*.S' -o -name '*.s' | xargs grep .global | awk '{print $3}' | egrep -v '(^$|^__|PLT|GOT|,)' | sort | uniq`

Putting it all together (Example)

Combining the stack trace with task information can be even more powerful than what basic C debuggers offer.

The following is an example of a stack trace output for multiple tasks. It prints two lines per task.

Task 0 {noname} (p:31, pc:0x2180b8, sp0x1edfe34, lr:0x2506d8, WAITING4SEND):
nameserver @ 0x2505e8+0,
Task 1 {noname} (p:0, pc:0x24c55c, sp0x1eafff0, lr:0x21809c, READY):
kernel_idleserver @ 0x24c550+0,
Task 3 TIMESERVER (p:31, pc:0x2180b8, sp0x1e4ff80, lr:0x21d1cc, WAITING4SEND):
timeserver @ 0x21d074+0,
Task 4 {noname} (p:31, pc:0x2180d0, sp0x1e1ffe0, lr:0x21f818, WAITING4EVENT):
eventnotifier @ 0x21f7c4+0,
Task 5 IOSERVER_COM1 (p:31, pc:0x2180b8, sp0x1deff04, lr:0x21eab8, WAITING4SEND):
ioserver @ 0x21e82c+0,
Task 6 {noname} (p:30, pc:0x2180d0, sp0x1dbffe0, lr:0x21f818, WAITING4EVENT):
eventnotifier @ 0x21f7c4+0,
Task 7 IOSERVER_COM2 (p:31, pc:0x2180e0, sp0x1d8fe6c, lr:0x21e104, RUNNING):
[unknown function] @ 0x21df94+0, ioserver @ 0x21e82c+253,
Task 8 {noname} (p:30, pc:0x2180b0, sp0x1d5ffe0, lr:0x21f830, WAITING4REPLY, last_receiver: 7):
eventnotifier @ 0x21f7c4+0,
Task 9 {noname} (p:2, pc:0x2180b0, sp0x1d2f878, lr:0x22006c, WAITING4RECEIVE):
uiserver_move @ 0x220018+0, timedisplay_update @ 0x23bed4+49, dumbbus_dispatch @ 0x21a5a8+15, a0 @ 0x234c88+646,

Task status code:

  • WAITING4SEND means the task is waiting for another task to send a message.
  • WAITING4RECEIVE means the task has sent a message but the receiver has not received the message yet.
  • WAITING4REPLY means the task has sent a message and someone received it but has not replied yet.
  • last_receiver tells us the last task that received the message from this task.
  • WAITING4EVENT means the task is waiting for a kernel event (e.g. IO).
  • READY means the task is ready to run next as soon as this task becomes the top priority task.
  • RUNNING means the task is currently running.

The first line displays the task number, name, priority, registers, task status and the task synchronization information. The second line displays the stack trace with the offsets from the address of the function.

Why is this powerful? We can use this to solve really complex synchronization issues with wait chains & priorities that is otherwise nearly impossible to without this information. At the end, we had more than 40 tasks interacting with each other and my life would have been much harder without this information.

Limitations

The major limitation of this method is that it can't print the names of static functions. This is because the symbols for static functions are not exported globally. This is not a huge problem because you can still see the names from the output of objdump.

Catergorized under: programming / cs452

Published: 2012-12-10T16:47:00.000
Last modified: 2019-02-18T19:56:27.167708
Permalink

Debugging ARM without a Debugger 2: Abort Handlers

This is my second post in the series Debugging ARM without a Debugger.

This is an excerpt from my debugging techniques document for Real-time Programming. These techniques are written in the context of writing a QNX-like real-time microkernel and a model train controller on a ARMv4 (ARM920T, Technologic TS-7200). The source code is located here. My teammate (Pavel Bakhilau) and I are the authors of the code.



It is useful to have a simple abort handler early on before working on anything complex, like context switch. The default abort handlers that come with the bootloader spew out minimal information for gdb if lucky, or often they just hang with no message (In fact, I am now very grateful that I am able to kernel panic messages at all when things are gravely wrong with my computer). By installing an abort handler, you will be able to what went wrong in case the asserts were not good enough to catch problems earlier.

Installation

There are three interrupt vectors that need to be intercepted: undefined instruction (0x4), prefetch abort (0xc) and data abort (0x10). We can re-use one abort handler because the abort type can be read from the cpsr. One exception is that both instruction fetch abort and data fetch abort share the same processor mode. We can work around this by passing a flag to the C abort handler. The following is a sample code:

// c prototype of the abort handler
void handle_abort(int fp, int dataabort);

// the abort handler in assembly that calls the C handler
.global asm_handle_dabort
asm_handle_dabort:
    mov r1, #1
    b abort

.global asm_handle_abort
asm_handle_abort:
    mov r1, #0
    abort:
    ldr sp, =0x2000000
    mov r0, fp
    bl handle_abort
    dead:
    b dead

Because ARM has a separate set of banked registers for abort modes, the stack pointer is uninitialized. Since I wanted to use a C handler to print out messages, I need to set up a stack. In this code, I manually set the stack pointer to be the end of the physical memory (our board had 32MB RAM in total so 0x2000000 is the end of the memory). For convenience, I also pass the current frame pointer in case I want to examine the stack of the abort-causing code.

When dealing with register values directly in C, it is convenient to have the following macro to read register values:

#define READ_REGISTER(var) \
__asm volatile("mov %[" #var "], " #var "\n\t" : [var] "=r" (var))
// usage: int lr; READ_REGISTER(lr);
#define READ_CPSR(var) \
__asm volatile("mrs %[mode], cpsr" "\n\t" "and %[mode], %[mode], #0x1f" "\n\t" \
: [mode] "=r" (var))
// usage: int cpsr; READ_CPSR(cpsr);

In the C abort handler, by reading the cpsr, you should be able to figure out the current mode. Refer to ARM Reference Manual section A2.2.

The following a brief summary of the abort environment and their interpretation. The precise information can be found in the reference manual chapter A2. You should read the manual to understand the process better.

An important thing to remember is that you should do your best to ensure that your abort handler does not cause another abort inside. Again, be very conservative when dereferencing pointers.

Interpretation

Read all the values from the registers first, and then print. Otherwise, there is a chance some registers might get overwritten.

cpsr

dabort refers to the second parameter passed into the C abort handler.

The lower 5 bits of cpsr

Interpretation

0x13

You are in svc mode. It probably means your abort handler caused another abort inside. Fix it.

0x17 (dataabort = 0)

Instruction fetch abort

0x17 (dataabort = 1)

Data fetch abort

0x1B

Undefined instruction

lr

Link Register normally contains the address to one instruction after the instruction that called the current function.

Current mode

Interpretation

Data fetch abort

The abort was caused by the instruction at lr - 8

Instruction fetch abort

The abort was caused by the instruction at lr - 4

Undefined instruction

The abort was caused by the instruction at lr

Fault type (in case of data/instr. fetch abort)

Read the fault type using the following code:

volatile unsigned int faulttype;
__asm volatile ("mrc p15, 0, %[ft], c5, c0, 0\n\t" : [ft] "=r" (faulttype));
faulttype &= 0xf;

Fault type value

Interpretation

(faulttype >> 0x2) == 0

misaligned memory access

0x5

translation

0x8

external abort on noncacheable

0x9

domain

0xD

permission

To see a big picture of how the fault checking works (other than misaligned memory access), you are advised to read the section 3.7 of ARM920T Technical Reference Manual. In short, unless you are making use of memory protection, you will never get domain and permission faults.

Data fault address (only applicable to a data abort)

This is the address the code tried to access, which caused the data fetch abort. Read it using the following code:

volatile unsigned int datafaultaddr;
__asm volatile ("mrc p15, 0, %[dfa], c6, c0, 0\n\t" : [dfa] "=r" (datafaultaddr));

Our actual abort handling code is located here.

Summary

It is very convenient to have a bullet-proof abort handler. It really gives you a lot more information about the problem than a hang. As well, don’t forget that most DRAM content is not erased after a hard reset, so you can use RedBoot’s dump (x) command to examine the memory, if really needed. With some effort, one can also set up the MMU to implement a very simple write-protection of the code region. Such protection could be useful to prevent the most insidious kind of bugs from occurring (Luckily, we did not have to deal with such bugs). 

Catergorized under: programming / cs452

Published: 2012-01-27T05:18:00.000
Last modified: 2019-02-18T19:56:27.167708
Permalink

Debugging ARM without a Debugger 1: Use of Asserts

This is my first post in the series Debugging ARM without a Debugger.

This is an excerpt from my debugging techniques document for Real-time Programming. These techniques are written in the context of writing a QNX-like real-time microkernel and a model train controller on a ARMv4 (ARM920T, Technologic TS-7200). The source code is located here. My teammate (Pavel Bakhilau) and I are the authors of the code.


Failing fast is an extremely useful property when programming in C. For example, problems with pointers are much easier to debug if you know exactly when an invalid pointer value is passed into a function. Here are few tips for asserting effectively:

There is no such thing as putting too much asserts.

CPU power used for asserts will almost never cause a critical performance issue [in this course]. You can disable them when you know your code is perfect. Verify pointers every pointer dereference.

Assert pointers more aggressively.

Do not just check for NULLs. We know more about the pointer addresses. We know that the pointer address is limited by the size of the memory. As well, from the linker script, we can even deduce more information. For example, we know that normally, we would not want to dereference anything below the address 0x218000 because that is where the kernel is loaded. Similarly, we can figure out what memory region is text and data.

Remove all uncertainties.

Turn off interrupts as soon as possible in the assert macro. When things go wrong, you want to stop the program execution (and the trains) right away. If you do not turn off interrupts, a context switch might occur to other task and you might not be able to come back ever to stop and display what went wrong.

Print as much information as possible.

Make an assert macro that resembles printf and print as much contextual information as possible. When you have no debugger, rebooting and reproducing can be really time-consuming. 1.5 months is a very short time to build an operating system from scratch so use it wisely.

e.g. ASSERT(condition, “oops! var1:%d, var2:%x, var3:%s”, var1, var2, var3);

Example

Here’s a short snippet of ASSERT macro. It has evolved over 3 months and it looks really dirty but it works. (source)

typedef uint volatile * volatile vmemptr;

#define VMEM(x) (*(vmemptr)(x))
void bwprintf(int channel, char *fmt, ...);
#define READ_REGISTER(var) __asm volatile("mov %[" TOSTRING(var) "], " TOSTRING(var) "\n\t" : [var] "=r" (var))
#define READ_CPSR(var) __asm("mrs %[mode], cpsr" "\n\t" "and %[mode], %[mode], #0x1f" "\n\t" : [mode] "=r" (var))
void print_stack_trace(uint fp, int clearscreen);
void td_print_crash_dump();
int MyTid();

#if ASSERT_ENABLED
#define ASSERT(X, ...) { \
        if (!(X)) { \
                VMEM(VIC1 + INTENCLR_OFFSET) = ~0; /* turn off the vectored interrupt controllers */ \
                VMEM(VIC2 + INTENCLR_OFFSET) = ~0; \
                int cpsr; READ_CPSR(cpsr); \
                int inusermode = ((cpsr & 0x1f) == 0x10); int tid = inusermode ? MyTid() : -1; \
                bwprintf(0, "%c", 0x61); /* emergency shutdown of the train */ \
                int fp, lr, pc; READ_REGISTER(fp); READ_REGISTER(lr); READ_REGISTER(pc); \
                bwprintf(1, "\x1B[1;1H" "\x1B[1K"); \
                bwprintf(1, "assertion failed in file " __FILE__ " line:" TOSTRING(__LINE__) " lr: %x pc: %x, tid: %d" CRLF, lr, pc, tid); \
                bwprintf(1, "[%s] ", __func__); \
                bwprintf(1, __VA_ARGS__); \
                bwprintf(1, "\n"); /* if in usermode ask kernel for crashdump, otherwise print it directly */ \
                if (inusermode) { __asm("swi 12\n\t");} else { td_print_crash_dump(); } \
                bwprintf(1, "\x1B[1K"); \
                print_stack_trace(fp, 0); \
                die(); \
        } \
}
#else
#define ASSERT(X, ...)
#endif

That’s it for today.

Catergorized under: programming / cs452

Published: 2012-01-25T05:04:00.000
Last modified: 2019-02-18T19:56:27.167708
Permalink