In your previous programming experience, you may have managed without using a debugger. You might have been able to find the mistakes in your programs by printing things on the screen or simply reading through your code. Beware, however, that OS/161 is a large and complex body of code, much more so than you may have worked on in the past. To make matters worse, much of it was written by someone other than you. A debugger is an essential tool in this environment. You should, therefore, take the time to learn GDB and make it your best friend. This guide will explain to you how to get started debugging OS/161, describe the most common GDB commands, and suggest some helpful debugging techniques.
Develop and implement synchronization primitives. Conceive of and implement unit tests for said primitives.To debug OS/161, you should have completed building your toolchains. The toolchains have five parts binutiles, gcc, gdb, System/161 and bmake. The gdb is the debugger is the one you should use because this copy of GDB has been configured for the MIPS architecture and has been patched to be able to talk to System/161. The difference between debugging a regular program and debugging an OS/161 kernel is that the kernel is running in a machine simulator. You want to debug the kernel; running the debugger on the machine simulator is not very illuminating and we hope it will not be necessary. If you were to type:
% mips-harvard-os161-gdb sys161
you would be attempting to debug the simulator. This will not work, because the simulator is not compiled for MIPS. (If you do need to debug the simulator at some point, you would use the regular system copy of GDB.) So you must type this:
% mips-harvard-os161-gdb kernel
You will find, however, that having done this, telling GDB to run the kernel does not work, because the kernel has to be run on System/161.
Instead, what you need to do is start your kernel running in System/161; then run mips-harvard-os161-gdb on the same kernel and tell it to attach to the copy you started running. To do this you have to tell GDB to talk to System/161's debugger port.
This requires two windows, one to run the kernel in and one to run GDB in. These two windows must be logged into the same machine. It will not work if they are not.
Be aware that you may be using a cluster of machines; that is, there are actually several computers, and when you log in you may end up logged in to any one of them. After you log your first window in, check which actual machine you got, like this:
% hostname guardian.it.mtu.edu
The response tells you which actual computer you are logged into. When logging in your second window (and any others that may need to talk to your System/161 processes) you should log into this machine directly:
% ssh guardian.it.mtu.edu
Now you are ready to debug. In one window (the run window), boot OS/161 on System/161. Use the -w option to tell System/161 to wait for a debugger connection:
% cd ~/os161/root % sys161 -w kernel
Next, in the other window (your debug window), run mips-harvard-os161-gdb on the same kernel (if you run it on a different kernel by accident, you'll get bizarre results) and tell GDB to connect to System/161:
% cd ~/os161/root % mips-harvard-os161-gdb kernel (gdb) dir ../src/kern/compile/ASST0 (gdb) target remote unix:.sockets/gdb
Be careful to type mips-harvard-os161-gdb and not just gdb.
The second to last line indicates where to find source. The last component "ASST0" should correspond to the configuration name for the kernel you are debugging. It will be "ASST1" for the first assignment. The last line in the example above tells gdb how to connect to the runnning sys161 process so you can debug the kernel. The line before tells gdb where to start when looking for source files.
GDB will connect up and it will tell you that the program is stopped somewhere in start.S. It is waiting at the very first instruction of your kernel, as if you'd run it from GDB and put a breakpoint there. At this point, you can use GDB to debug your kernel as you would debug any other program. In particular, you can set breakpoints to make execution halt at a particular point in the kernel. When you are ready to go, you can give the GDB cont command:
(gdb) c
This should allow your kernel to run until it terminates or hits a breakpoint that you've set
When you are done debugging, you can disconnect the debugger from System/161 (and thus the running kernel) using the detach command:
(gdb) detach
You can also, instead, tell GDB to kill the process it's debugging. This will cause System/161 to exit unceremoniously, much as if you'd gone to its window and typed ^C:
kill
Note that you do not necessarily need to attach GDB to System/161 at startup. You can attach it at any time. However, for reasons we do not presently understand, connecting does not always work properly unless System/161 is stopped waiting for a debugger connection. You can put it into this state at any time by typing ^G into its window. This can be useful if your kernel is looping or deadlocked.
will display line 101 in your source file. If you have more than one source file, precede the line number by the file name and a colon:(gdb) l 101
Instead of specifying a line number, you can give a function name, in which case the listing will begin at the top of that function.(gdb) l os.c:101
means that your program will stop every time it executes a statement on line 18. As with the "list" command, you can specify to break at a function, e.g.:(gdb) b 18
(gdb) b main
you will delete the breakpoint number "1". GDB displays the number of a breakpoint when you set that breakpoint. Typing "d" without arguments will cause the deletion of all breakpoints.(gdb) d 1
if you want your program to continue the execution until the next breakpoint.(gdb) c
will execute the next line of code. If the next line is a function call, the debugger will step into this function.(gdb) s
(gdb) display x
will print the value of a variable "x" every time the program hits a breakpoint. If you want to print the value in hex, type:
(gdb) display /x x
The "printf" command allows you to specify the formatting of the output, just like you do with a C library printf() function. For example, you can type:
(gdb) printf "X = %d, Y = %d\n",X,Y
(gdb) command 2 > printf "theString = %s\n", theString > print /x x > end
will change the value of "x" to 15.(gdb) set variable x = 15
all the time, you could do:(gdb) target remote unix :.sockets/gdb
(gdb) define db Type commands for definition of "db". End with a line saying just "end". > target remote unix :.sockets/gdb > end
Then you could invoke it just by typing "db". (If you put this or other commands in a file called .gdbinit, GDB will execute them automatically at startup time.)
To see the current state of the hardware machine registers, type:(gdb) info breakpoints
(gdb) info registers
(gdb) help
Run gdb (like this): gdb
So in the end you should have:mips-harvard-os161-gdb kernel
displayed in the control window.Run gdb (like this): mips-harvard-os161-gdb kernel
More subtly, if you are debugging a multi-threaded program, such as a kernel, the order in which the instructions are executed depends on how your threads are scheduled, nd some bugs may or may not manifest themselves under a particular execution scenario. Because printf outputs to the console, and the console in System/161 is a serial device that isn't extraordinarily fast, an extra call to printf may alter the timing and scheduling considerably. This can make bugs hide or appear to come and go, which makes your debugging job much more difficult.
To help address this problem, System/161 provides a simple debug output facility as part of its trace control device. One of the trace control device's registers, when written to, prints a notice in the System/161 output including the value that was written. In OS/161, provided your System/161 has been configured to include the trace control device, you can access this feature by calling trace_debug(), which is defined in dev/lamebus/ltrace.h. While this is less disruptive than calling printf, it is still not instant and can still alter the timing of execution. By contrast, the System/161 debugger interface is completely invisible; as far as your kernel is concerned, time is stopped while you are working in the debugger.
The OS/161 toolchain now tells the assembler to emit line number information for assembly files, so in theory you should at least be able to see the file you're working on. (If GDB can't find the file, you can use the path command to tell it where to look.)
It is also sometimes helpful to disassemble the kernel; type
% objdump --disassemble kernel | less
in another window and page or search through it as needed.
To single step through assembler, use the nexti and stepi commands, which are like next and step but move by one instruction at a time.
The command x /i (examine as instructions) is useful for disassembling regions from inside GDB.
Use the command info registers to see the values that are being handled. Unfortunately, you can't print only one register.
One of the perhaps more interesting trace options is to have System/161 report every machine instruction that is executed, either at user level, at kernel level, or both. Because this setting generates vast volumes of output, it's generally not a good idea to turn it on from the command line. (It is sometimes useful, however, in the early stages of debugging assignment 2 or 3, to log all user-mode instructions.) However, the trace options can be turned on and off under software control using the System/161 trace control device. It can be extremely useful to turn instruction logging on for short intervals in places you suspect something strange is happening. See dev/lamebus/ltrace.h for further information.
Another trick with a stopped thread is to cast thread->pcb.pcb_savestack to struct switchframe *; this will let you inspect its saved register values. When you get a stack backtrace and it reaches an exception frame, GDB can sometimes now trace through the exception frame, but it doesn't always work very well. Sometimes it only gets one function past the exception, and sometimes it skips one function. (This is a result of properties of the MIPS architecture and the way GDB is implemented and doesn't appear readily fixable.) Always check the tf_epc field of the trap frame to see exactly where the exception happened, and if in doubt, cross-check it against a disassembly or have GDB disassemble the address.