Interactive Bootstraps“Numbers in detail” →And the implementation

018cd6683e50d842e20577659cf19dd236e8cd8a
🐶
Add initial hexdump implementation
It is inspired by `xxd` but it doesn't yet do ASCII, only hexadecimal.
It is also not yet very interactive, requiring changing the source,
recompiling and re-running. But it does compile fast.

🐶

I recommend having a terminal open with gcc hexdump.c -o hexdump && ./hexdump typed in to compile and run it in one go. Then when you change the file, you can just switch over, press enter, and use the up arrow to get that text back. If that is too much effort, you could try using entr, something like echo hexdump.c | entr sh -c 'gcc hexdump.c -o hexdump && ./hexdump' should work.

⟨hexdump.c⟩≡

@@ -0,0+1,43@@
1
+#include "hexdump.h"

🐱

This presumably includes our header, what is the difference between the #include "..." and #include <...> lines? We have used the former on our own header, the latter on a system header, is that it?

🐶

Yes that’s the idea, you can add paths to either the user or system header locations, but the default system headers include the standard library and installed software (if it has headers), and the current directory is always searched for user files.

2
+
3
+#include <stddef.h>
4
+#include <stdint.h>

🐶

This includes the fixed width integer types. Previously we have used int, which is the “most optimal type for the platform”. Usually it is 32-bits, but it is not given. Whereas stdint.h includes types like int8_t, int16_t, int32_t and int64_t, which are defined to be 8, 16 and 32-bits. There are also unsigned counterparts uint8_t, uint16_t, uint32_t and uint64_t.

Unsigned means not negative, so zero or higher.

🐱

And I recall a bit being either 0 or 1, so an 8-bit integer is 0000 0000, 0000 0001, 0000 0010, 0000 0011, …, 1111 1110, 1111 1111. Just a second, that’s from zero to 27 + 26 + 25 + 24 + 23 + 22 + 21 + 20, evaluating the exponentials it is 128 + 64 + 32 + 16 + 8 + 4 + 2 + 1, which adds up to 255. Or at least 255 possible values, if we have signed integers then some of those values are going to be negative I assume.

Also, I recall that a byte is another word for an 8-bit value.

5
+#include <stdio.h>
6
+
7
+void hexdump(const void *data, size_t len) {

🐶

For reference, if you run the program, you get the following output:

x
00000000: 0100 0000
y
00000000: 3412 0000
z
00000000: ffff ffff
s
00000000: 4865 6c6c 6f2c 2077 6f72 6c64 00
t
00000000: 1a20 4000 0000 0000

Here x, y, z, s and t are just the variables we are dumping, it is the lines below each that are the outputs of this hexdump function.

8
+	const uint8_t *bytes = data;

🐱

Ah yes you mentioned that a pointer to a void can be cast to a pointer to a different type. And bytes make a natural unit of length as they are the smallest integer type. And handily two hexadecimal digits make up a byte, with a one hexadecimal unit (4-bit) number being called a nibble.

🐘

Though historically for computers where a byte was a multiple of 3-bits, the octal number format was more handy.

One example of such is the PDP-8, which came in a wonderful 1960s orange look (its successor the PDP-12 had a green colour scheme). Not to be confused with the purple PDP-11, which was the computer for which the original C compiler was written for, and the PDP-11 also had the first release of Unix.

See, not all old computers looked like beige bricks, the Cray machines had a very recognisable style, the Cray-2 even had a waterfall for the coolant. And the Connection Machines was suitably full of red LEDs with a black case that would fit in to any Sci-Fi.

9
+	for (size_t i = 0; i < len; i++) {
10
+		if ((i & 15) == 0) {
11
+			printf("%08lX:", i);
12
+		}

🐶

Here we print the address/offset, but only if the number is divisible by 16 – we use 16 because that is 0x10 in hexadecimal so it’s a round number.

The way we check that a number is divisible by 16 is by checking the first hexadecimal digit (nibble), and making sure it is zero. Compare with checking that a decimal number is divisible by 10, if it ends in a zero, it is.

The way we extract the last nibble, is by using the bitwise operator AND “&”, which goes over each bit in both numbers, and applies the AND operation to it. Let’s have a quick reminder of the bitswise operators, also including OR “|” and XOR (exclusive-or) “^”.

A B A & B A | B A ^ B
0 0 0 0 0
0 1 0 1 1
1 0 0 1 1
1 1 1 1 0

Some notes here, the bitwise AND is also a multiplication for a single bit. And XOR is addition, ignoring the carry, for which AND represents if we have a carry bit.

Another way we could have written “is divisible by 16”, is using the remainder operator, i % 16 == 0, but it is common to use bitwise AND for powers of two.

One last thing is, you note we have actually written (i & 15) == 0, rather than just i & 15 == 0. This is because in C the & actually binds less tightly than == so the latter would be parsed as i & (15 == 0), which would evaluate to i & false and the false would get implicitly cast to 1 so we would end up with i & 1. The order of precedence in C mostly behaves how you expect it (if you remember BIDMAS/BODMAS/BEDMAS/PODMAS, whichever acronym you learnt at school), but the bitwise operators do bind less tightly than equality comparisons so they should be parenthesised. If you aren’t sure you can always add parentheses.

13
+		if ((i & 1) == 0) {
14
+			printf(" ");
15
+		}

🐱

This then prints a space if the number is divisible by two, or in other words, even. Given we start at zero, it prints the initial space after the address/offset too.

16
+		printf("%02x", bytes[i]);
17
+		if ((i & 15) == 15 || i == len - 1) {
18
+			printf("\n");
19
+		}

🐶

And finally printing the actual byte – the %02x prints it in lower-case hexadecimal (0-9, a-f), and pads it to two nibbles using zero if needed.

Then if we have reached the end of a run of sixteen nibbles, or we are on the last number, we print a new line too. The OR operator “||” is logical OR, not bit-wise (remember the bitwise OR is “|”). The difference is two-fold, first the expression “a || b” only cares if either a or b are true (and in C, any non-zero integer, or non-NULL pointer is considered to be true), and the expression “true || b” does not actually evaluate “b” as it already knows the result is going to be true.

Don’t worry, I have covered a lot here, I will keep pointing things out a bit more, so you don’t have to worry about remembering it all in the first place.

20
+	}
21
+}
22
+
23
+int main(int argc, char **argv) {
24
+	printf("x\n");
25
+	const uint32_t x = 1;
26
+	hexdump(&x, sizeof(x));

🐱

Ah finally, some examples for me to play around with.

🐶

Please do play around. If you want some things to look at, you can look at the order of bytes, figure out how negative values work, and figure out a bit about how characters are represented.

🐱

But I’ll also go for a coffee break and have some biscuits while I’m digesting all of this information. I have decaf to not over caffeinate.

27
+
28
+	printf("y\n");
29
+	const uint32_t y = 0x1234;
30
+	hexdump(&y, sizeof(y));
31
+
32
+	printf("z\n");
33
+	const uint32_t z = -1;
34
+	hexdump(&z, sizeof(z));
35
+
36
+	printf("s\n");
37
+	const char s[] = "Hello, world";
38
+	hexdump(&s, sizeof(s));
39
+
40
+	printf("t\n");
41
+	const char *t = "Hello, world";
42
+	hexdump(&t, sizeof(t));
43
+}