I’m assuming you already know what fastbin poison is. If you don’t you can check this out: https://github.com/shellphish/how2heap/blob/master/glibc_2.35/fastbin_dup.c , or search for some source which explains it in more detail.

Fastbin exploitation isn’t that common nowadays since you can usually just attack the tcache instead, and arbitrary write is cleaner with tcache because you can allocate almost wherever you want while with fastbins you have to allocate somewhere with an appropriate fake chunk size. That being said it can come up from time to time, I encountered it “recently” in 2024/tfc/mcback2dabasics and vspm.

Edit: To be clear, unless you are exploiting an old glibc version or one where tcache is disabled, fastbin poison chains are obsolete as performing fastbin poison will activate tcache stashing, and you will always be allocating the AAW chunk from the tcache, without the size restriction explained below.

fastbin poison

Let’s take a quick refresher on some points that aren’t commonly discussed.

When performing the fastbin attack the most important point is that on allocation our fake chunk needs to have a chunk size which belongs to its bin. Chunk size is of type size_t which is 64 bits. Sizes are rounded down, so a size of 0x7f belongs to the 0x70 bin. As libc is a common target, a common bin to look out for is the 0x70 bin as it will accept libc pointers next to a zero in memory. Importantly the chunks don’t need to be 0x10 aligned.

Another place we can fail the poison is this check at the very end of __libc_malloc that is always run (source, single thread source):

//...
#define IS_MMAPPED 0x2
#define chunk_is_mmapped(p) ((p)->mchunk_size & IS_MMAPPED)

#define NON_MAIN_ARENA 0x4
#define chunk_main_arena(p) \
	(((p)->mchunk_size & NON_MAIN_ARENA) == 0)
//...
void* __libc_malloc (size_t bytes)
{
	// ...
	assert (!victim || chunk_is_mmapped (mem2chunk (victim)) ||
          ar_ptr == arena_for_chunk (mem2chunk (victim)));
  return victim;
}

If we enter arena_for_chunk (source) and are not in main_arena (or any other arena) we will segfault.

/* find the heap and corresponding arena for a given ptr */

static inline heap_info *
heap_for_ptr (void *ptr)
{
  size_t max_size = heap_max_size ();
  return PTR_ALIGN_DOWN (ptr, max_size);
}

static inline struct malloc_state *
arena_for_chunk (mchunkptr ptr)
{
  return chunk_main_arena (ptr) ? 
		  &main_arena : heap_for_ptr (ptr)->ar_ptr;
}

So we want our chunk size to either have its 2nd LSB set (IS_MMAPPED) pretending to be mmaped or 3rd LSB unset (NON_MAIN_ARENA) pretending to be main_arena. Thus, it’s only problematic if the chunk size looks like 0b...xxx10x, which will happen 1/4 of the time due to ASLR (note that this isn’t the LSByte).

Using pwndbg’s find_fake_fast makes finding suitable addresses easier. One heuristic for manual searching is searching memory for values whos first byte is semi-predictable (e.g. libc pointers), followed by a null (or most lower bytes are zero) qword.

Common places of interest are __malloc_hook (-0x23), _IO_2_1_stdout_ (-0x43) for FSOP (arbitrary read and RCE), _IO_2_1_stdin_ (-0x13) (actual arbitrary write), __dso_handle (-0x3) all of which have suitable fake sizes above them.

Most often __malloc_hook with onegadget is the play (though glibc <= 2.33 obviously). For an example of this attack via __malloc_hook see nightmare/0ctf/baby_heap. Use pwndbg’s onegadget (uses --level=100 by default) to find all satisfiable gadgets instantly. If they still don’t work try to make the conditions true by doing something, it is always somehow doable from my experience.

fastbin poison chains

If you really can’t get any onegadgets to work, or you have glibc > 2.33 you may want to try to use fastbin poison chains. Also just knowing about them is nice since they are pretty much guaranteed to work (unlike onegadget for example), you can always have them in the back of your mind as a last resort.

The idea is that you have some target address, but a suitable fake size is very high above it (lower address values). You allocate a fake chunk on the suitable size, and at the very end of your chunk you write a new fake size (usually 0x81 to minimize the number of allocations), you then keep allocating fake chunks and writing fake sizes down until you reach your target address.

Though I haven’t seen people talk about this idea, its quite natural to think about “okay if I don’t have a suitable fake size in place, is there a way for me to get it there?” so I’m sure people have thought of this. It probably isn’t discussed much since it’s relatively outdated and onegadget usually works.

With this method it’s especially important to check that all of your fake chunks have fd=NULL since you will be using the same fastbin over and again you can’t affort to corrupt it.

Since pwndbg’s find_fake_fast doesn’t work in this case, we can do something like:

pwndbg> search -p 0x7f libc.so.6 -w
libc.so.6    0x7ffff7bc333d 0x7f
libc.so.6    0x7ffff7bc3375 0x7f
[snip]
libc.so.6    0x7ffff7bc46c5 0x7f
libc.so.6    0x7ffff7bc471d 0x7f
pwndbg>

and then just find the address above our target which is closest. The output is sorted and not too big so it’s easy. We can also search completely manually, by incrementally printing memory above our target. Note that x/100gx is 0x320 bytes so we can do something like this x/100gx 0x7ebe1adc57a8 - (0x320 * 1) and increase the multiplier (1) as we check. Also, use the raw address, putting a symbol like &__free_hook messes stuff up.

So yeah, the question comes up: “What is our actual target for the arbitrary write?” and __free_hook is a well known idea (leveraging a free("/bin/sh")). There are a few problems with this though. For one, it’s pretty far from any predictable address which we can use to start the chain, and, maybe more importantly, it doesn’t work (for glibc >= 3.34).

Looking at various ways to get RCE from libc arbitrary address write and suitable values above lead me to the good old exploit of overwriting initial (more commonly referred to as __exit_funcs source). For more info on how that exploit works see http://binholic.blogspot.com/2017/05/notes-on-abusing-exit-handlers.html and https://ctftime.org/writeup/34804 .

The starting point we can use for our poison chain is a nice and consistent symbol called DW.ref.__gcc_personality_v0. It’s located in the libc writable section just below the stderr/stdout/stdin pointers, which point to, and are right below the _IO_2_1_stderr_/_IO_2_1_stdout_/_IO_2_1_stdin_ structures. Since the DW.ref.__gcc_personality_v0 symbol contains dots to print it you need to use single quotes like this: p/x (uint64_t)'DW.ref.__gcc_personality_v0'. The value of the symbol is the address of __gcc_personality_v0, i.e. it contains a libc pointer, and is in fact the closest pointer above both initial and __free_hook.

To perform the initial exit handler exploit though, we need an arbitrary read primitive. We get this easily from stdout FSOP because there is always a suitable fake free chunk size directly above the stdout struct (_IO_2_1_stdout_ (-0x43)). You can get the payload from pwntools using the FileStructure class, there are many resources online on how it works.

One obstacle we may encounter is that our exploit takes too many allocations, as heap CTF challenges usually place a restriction on how many times we can allocate by keeping a counter of our allocations. How to get around this? The most obvious solution is the best, we will simply overwrite the counter to zero from time to time. Most often the counter is stored in .bss as it’s usually zero at the beginning of the program (denoting zero allocations) so it is pretty much guaranteed to be after the .data section. That is really good to know as .data will contain predictable values which we can use for our fake fast chunk size. Even if the program doesn’t do anything special and has a really small .data section, we can always target the stderr/stdin/stdout pointers or __dso_handle as they are located in the programs mappings.

So, in this exploit, before we start making a fake fast chunk chain from DW.ref.__gcc_personality_v0 to initial we make one from (for example) stderr (which is in the program’s memory, not the one in libc) to the chunk counter. This will usually only take a few allocations, and so we won’t trigger the allocation limit by trying to clear it. After we do this once, depending on the program specifics we might be able to just clear the counter in one allocation from now on (if the fake chunk size we set in the .bss is not overwritten by the program logic), and if not it doesn’t matter as we can just rerun the chain each time. Of course be careful not to overwrite any critical variables.

Once we have a counter clearing primitive, the exploit itself is pretty straight forward, here is a snippet from how I solved 2024/tfc/mcback2dabasics during the competition (i forgor one_gadget has level flag). Here I was targetting __free_hook because it was an old enough libc for it to work, and the distance to the target (from DW.ref.__gcc_personality_v0) doesn’t really matter if you have a counter clearing primitive.

print("==== walking to __free_hook ====")

chain_start = libc.address + 0x3C26F5
addr = chain_start
data = b"\x00" * 0x66 + b"\x81"

# arb_write(chunk user size, (internal) chunk address, user data)
# performs a single fastbin poison
arb_write(0x68, addr, data) 
addr += 0x6E
data = b"\x00" * 0x6F + b"\x81"

for i in range(34):
    arb_write(0x71, addr, data)
    addr += 0x77
    if i % 5 == 0:
        clear_counter()    # guess what this function does

data = b"\x00" * 0x47 + pack(libc.sym["system"])
arb_write(0x71, addr, data)

# a chunk allocated at the start of the program with the /bin/sh string
free(binsh)

There are many program/challenge specifics that can make fastbin poison chains easier to pull off (high allocation limit, presence of IO pointers in the program, non-volatile .bss section) but even if everything “goes wrong” the idea is still pretty much guaranteed to work.

Quite neat no?