--------------------------------------------------------------- Software Logic Analyzer Handles Tough Timing Bugs 1.0 Introduction Many of us in the embedded software world grew up on hardware logic analyzers as a great tool for finding those really hard timing problems that can crop up. These analyzers have the ability to capture address and data bus cycles and store them to memory. They also have complex trigger and "store only" features that allow you to find some of the most complex timing problems. With the advent of on-chip cache where the processor might be fetching instructions but never applying signals to an external bus, the logic analyzer has outlived it's usefulness, at least in the software world. Many have turned to JTAG (Joint Test Action Group) debuggers as the tool of choice. While these are great, they lack many of the features for finding timing bugs that a logic analyzer can find. Peppering your code with printf's is also another tried and true debugging technique but its bandwidth is very limited and it's sometimes prohibited from interrupts. 2.0 To the rescue Logic analyzers are very expensive, what we need is a "poor man's logic analyzer" (pmla). Let's keep it in software and make it free! Each storage event represents one subroutine call to the software logic analyzer. Just insert subroutine calls to either "save" or "trigger" in your kernel driver or anywhere in the kernel's code space (including interrupt space). The subroutine calls can not be done from user space. For example: --------------------------------------------------------------- pmla_save(42, parm1, parm2); --------------------------------------------------------------- will save one trace event with an identification number, (id "42" above ) two parameters that you pass in (or just pass 0), and a time-stamp. The identification number is arbitrary but unique and is assigned by you to identify the location in your code that you are tracing. Here is an example that will cause a trigger point to be in the center of the finished trace. --------------------------------------------------------------- pmla_trig_center(40, priv->cur_rx->cbd_sc, priv->last_tx_transmission_status); --------------------------------------------------------------- To output the current trace, you dump a /proc entry. The trace buffer is 190 deep, here is what the middle of the trace might look like: --------------------------------------------------------------- ID LABEL PARAM1 PARAM2 REL TIME .... id42 = int entry, irq, pid, state, das_pid , 0x00000001, 0xef9000b3, 4.43 MS id42 = int entry, irq, pid, state, das_pid , 0x405490fc, 0xbebff510, 1.0 us id10 = rxint timeo: p_pack, pack seq num , 0xc2f43464, 0x3bb20200, 30.5 us id31 = do ana entry: gsp_tag, length , 0x0000efbe, 0x00000080, 1.0 us id48 = int timout: PC before int, pid , 0x401d9754, 0x00000044, 30.5 us Cid40= int getting behind, state, pid , 0x00000000, 0x00000044, 1.0 us id42 = int entry, irq, pid, state, das_pid , 0x00000001, 0xef9000b3, 1.0 us id42 = int entry, irq, pid, state, das_pid , 0x405490fc, 0xbebff510, 1.0 us id 5 = rxint pg 1: us_pr, us_cr , 0x22ef7c00, 0x00000400, 30.5 us id49 = vector_swi: USER addr, syscall , 0x40196ce0, 0xef900077, 1.92 MS id49 = vector_swi: USER addr, syscall , 0x00289680, 0xef900066, 1.25 MS id49 = vector_swi: USER addr, syscall , 0x0036f57c, 0xef900036, 213 us .... --------------------------------------------------------------- The REL TIME timing on the right is gotten from the time-stamp register of the CPU. The text LABEL is gotten from a perl script that you enter into a table. The ID "Cid40" is the trigger point, the "C" in the ID field is for Center. You can pass anything you want for the two parameters (PARAM1 and PARAM2). Some examples might be hardware registers, loop counters, or state information. The software analyzer is good for finding timing problems that are periodic in nature, like packet communications or timer interrupts. It's not very useful for testing whether something happens once, a printf is much easier in that case. 3.0 Details. The software analyzer is not Symetric Multi-Processing safe since it doesn't have a per-CPU database. It works by storing a time-stamp (see figure 1 below) and two parameters in a circular buffer. The calls to save a time-stamp and two parameters take very little code so they are not very intrusive to kernel timing. However, you can cause problems if you put calls in the kernel scheduler itself without regard to how often they are being called. The trace info is extracted using a C program and a perl script to format the output. --------------------------------------------------------------- cat /proc/driver/pmla_raw > pmla.out scripts/raw_pmla -f pmla.out | scripts/pmla.pl > trace.lst --------------------------------------------------------------- The act of doing the "cat /proc/driver/pmla_raw" causes the trigger to reset and clears the trace buffer. -- Figure1.txt goes here --------- 4.0 Installation The software analyzer consists of a loadable driver that contains the subroutines that you call to save a time-stamp. Before you build the driver, you need to get your kernel and build it. This is outside the scope of this article but here is a very terse set of commands you need to do in the case of linux-2.6.9 or linux-2.4.27 (make sure you really are running linux-2.6.9 or linux-2.4.27 by doing "uname -a"): --------------------------------------------------------------- cd /usr/src wget http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.9.tar.bz2 tar jxf linux-2.6.9.tar.bz2; ln -s linux-2.6.9 linux; cd linux cp /boot/config-2.6.9 .config; ln -s asm-i386 include/asm make oldconfig; make bzImage; make modules; sudo make modules_install # for 2.4.27 change 2.6.9 => 2.4.27, v2.6 => v.24, # ln -s asm-i386 include/asm => make dep --------------------------------------------------------------- To build the driver do: --------------------------------------------------------------- tar zxf pmla-1.0.tar.gz cd pmla-1.0 make clock # set the clock speed NOTE: Don't do if cross compiling # NOTE: if you are cross compiling you will need to edit # PMLA_CLOCK_KHz in both pmla.h and scripts/pmla.pl # and set it to your processor speed. make sudo make load --------------------------------------------------------------- 5.0 Testing All tests were performed with Knoppix 3.7 (Debian based) on both kernel versions linux-2.6.9 and linux-2.4.27. A version of the i386 rtc (real time clock) driver has been included in the distribution so that you can easily try the pmla on your system. First we need to remove any rtc module currently running. If your kernel has the rtc driver built in, then this won't work. You can test this by doing: lsmod | grep rtc If this shows no output and the command "hwclock" works, then the rtc driver is built into your version of the kernel. --------------------------------------------------------------- make rtc # build modified rtc sudo make rtc_unload # unload old rtc sudo make rtc_load # load modified rtc --------------------------------------------------------------- Now if you do: --------------------------------------------------------------- make test make dump --------------------------------------------------------------- You will see: --------------------------------------------------------------- ID LABEL PARAM1 PARAM2 REL TIME id 1 = pmla test, rtc int, begin : int, dev, 0x00000008, 0x00000000, 15.62 MS id 2 = pmla test, rtc int, end : int, dev, 0x00000008, 0x00000000, 6.5 us id 1 = pmla test, rtc int, begin : int, dev, 0x00000008, 0x00000000, 15.62 MS id 2 = pmla test, rtc int, end : int, dev, 0x00000008, 0x00000000, 6.2 us id 1 = pmla test, rtc int, begin : int, dev, 0x00000008, 0x00000000, 15.62 MS id 2 = pmla test, rtc int, end : int, dev, 0x00000008, 0x00000000, 6.4 us id 1 = pmla test, rtc int, begin : int, dev, 0x00000008, 0x00000000, 15.62 MS .... --------------------------------------------------------------- The line in test/test.c: --------------------------------------------------------------- res = ioctl (fd, RTC_IRQP_SET, RTC_HZ ); /* Enable periodic RTC interrupts */ --------------------------------------------------------------- sets the rtc periodic interrupt rate to 64 Hz. "id 1" is at the beginning of the rtc interrupt, "id 2" is at the end. The trace above shows that the approximate time for the rtc interrupt is 6.2 microseconds and that it runs every at 64 Hz (or 15.62 milliseconds). At this rate, the rtc interrupt is using 0.04% (6.4us / 15.62ms) of the processor. 6.0 Example Let's see if we can find a reported bug using the pmla. The reported bug goes like this - An ioctl call to the rtc driver with the RTC_AIE_OFF (Mask alarm interrupt enable bit) option causes the periodic rtc interrupt interval time to change. The rtc.c driver in the pmla-1.0 distribution has this bug in it. We start by putting a call to pmla_save() in the beginning and end of the rtc interrupt and a call to pmla_trig_center() in the RTC_AIE_OFF section of the ioctl code to verify this is really what's causing the timing to go bad. Looking at the middle of the trace done above (in the Testing section), you will see: --------------------------------------------------------------- ID LABEL PARAM1 PARAM2 REL TIME .... id 1 = pmla test, rtc int, begin : int, dev, 0x00000008, 0x00000000, 15.62 MS id 2 = pmla test, rtc int, end : int, dev, 0x00000008, 0x00000000, 6.2 us id 1 = pmla test, rtc int, begin : int, dev, 0x00000008, 0x00000000, 15.62 MS id 2 = pmla test, rtc int, end : int, dev, 0x00000008, 0x00000000, 6.4 us id 1 = pmla test, rtc int, begin : int, dev, 0x00000008, 0x00000000, 15.62 MS id 2 = pmla test, rtc int, end : int, dev, 0x00000008, 0x00000000, 7.1 us Cid 3= pmla test, rtc RTC_AIE_OFF : , 0x00000000, 0x00000000, 12.18 MS id 1 = pmla test, rtc int, begin : int, dev, 0x00000008, 0x00000000, 18.8 us id 2 = pmla test, rtc int, end : int, dev, 0x00000008, 0x00000000, 6.6 us id 1 = pmla test, rtc int, begin : int, dev, 0x00000008, 0x00000000, 114 us id 2 = pmla test, rtc int, end : int, dev, 0x00000008, 0x00000000, 6.4 us id 1 = pmla test, rtc int, begin : int, dev, 0x00000008, 0x00000000, 115 us id 2 = pmla test, rtc int, end : int, dev, 0x00000008, 0x00000000, 6.0 us .... --------------------------------------------------------------- This shows that there is indeed a change in the frequency of the interrupt after the RTC_AIE_OFF ioctl was called. Before the "Cid 3" line, the periodic interval was 15.62 ms or 64 Hz. After "Cid 3", it went to 122 us or 8192 Hz. Separately, notice also that after the bug the rtc interrupt is using 5.2% (6.4 us / 122 us) of the processor. This trace shows us two things, the RTC_AIE_OFF ioctl code has something to do with the rate change and that the rtc interrupt is using quite a bit of processor time which could explain a system slowdown. Looking at the code around the pmla_trig_center() call, we find the culprit: CMOS_WRITE((CMOS_READ(RTC_FREQ_SELECT) & 0xf0) | 3, RTC_FREQ_SELECT); /* BUG!, (who put this code here?) */ Like many contrived examples, this one is pretty silly, but it does illustrate the technique for finding one example of timing problems. To add more trace points, put calls to pmla_save() in your code with unique id numbers, then add the corresponding id line to scripts/pmla.pl. For example, if we add this line to the ioctl RTC_PIE_OFF in rtc.c: pmla_save(4, 0, 0); /* debug only */ We would add the following line to pmla.pl: --------------------------------------------------------------- $id_lut[4] = "pmla test, rtc RTC_PIE_OFF : "; # table of id strings --------------------------------------------------------------- 7.0 Conclusion Don't forget to unload the modified rtc and reload your original one: --------------------------------------------------------------- sudo make rtc_unload sudo modprobe rtc --------------------------------------------------------------- While this software logic analyzer won't help you if your driver crashes the kernel, it's a great addition to your toolbox of debug tools. It's especially suited for finding event timing relationships and the bugs they sometimes cause. ----- Bio1.txt goes here -----