Exploiting FreeBSD-SA-19:02.fd

Posted on July 8, 2019 by K³

Introduction

In February 2019 the FreeBSD project issued an advisory about a possible vulnerability in the handling of file descriptors.

UNIX-like systems such as FreeBSD allow to send file descriptors to other processes via UNIX-domain sockets. This can for example be used to pass file access privileges to the receiving process.

Inside the kernel, file descriptors are used to indirectly reference a C struct which stores the relevant information about the file object. This could for instance include a reference to a vnode which describes the file for the file system, the file type, or the access privileges.

What really happens if a UNIX-domain socket is used to send a file descriptor to another process is that for the receiving process, inside the kernel a reference to this struct is created. As the new file descriptor is a reference to the same file object, all information is inherited. For instance, this can allow to give another process write access to a file on the drive even if the process owner is normally not able to open the file writable.

The advisory describes that FreeBSD 12.0 introduced a bug in this mechanism. As the file descriptor information is sent via a socket, the sender and the receiver have to allocate buffers for the procedure. If the receiving buffer is not large enough, the FreeBSD kernel attempts to close the received file descriptors to prevent a leak of these to the sender. However, while the responsible function closes the file descriptor, it fails to release the reference from the file descriptor to the file object. This could cause the reference counter to wrap.

The advisory further states that the impact of this bug is possibly a local privilege escalation to gain root privileges or a jail escape. However, no proof-of-concept was provided by the advisory authors.

This blog post catches up on that and describes Secfault Security’s research to exploit the bug in order to obtain a privilege escalation to root.

In the next section, the bug itself is analyzed to make a statement about the bug class and a guess about a possible exploitation primitive.

After that, the bug trigger is addressed.

It follows a discussion of three imaginable exploitation strategies - including a discussion of why two of these approaches failed.

In the section before last, the working exploit primitive is discussed. It introduces a (at least to the author’s knowledge) new exploitation technique for these kind of vulnerabilities in FreeBSD. The stabilization of the exploit is addressed, too.

The last section wraps everything up in a conclusion and points out further steps and challenges.

Furthermore, there is an appendix, which describes the test setup and kernel patches to accelerate the exploit conditions for testing.

It should be mentioned that the vulnerability was backported to the FreeBSD 11 development branch. However, the vulnerability was fixed in this branch, too, so it will not be present in the 11.3-release.

The issue has an assigned CVE which is CVE-2019-5596.

NB: All references to code lines are referring to the vulnerable source tree which was shipped with the initial FreeBSD 12 release. The source code can be found here if not otherwise mentioned.

All PoC code can be downloaded here.

Bug Analysis

To get a first hint about the origin of the bug, a look into the patch in revision r343790 of the FreeBSD 12 release engineering branch is a good start. This revision was mentioned in the advisory. It can be found in FreeBSD’s Phrabricator instance. The fix from this revision is the following:

  1578  void
  1579  m_dispose_extcontrolm(struct mbuf *m)
  1580  {
  ...
  1606          while (nfd-- > 0) {
  1607          fd = *fds++;
  1608          error = fget(td, fd, &cap_no_rights,
  1609                       &fp);
- 1610              if (error == 0)
+ 1610              if (error == 0) {
  1611                  fdclose(td, fp, fd);
+ 1612                  fdrop(fp, td);
+ 1613              }
  1614          }
  ...
  1621  }

Only a single call in the function m_dispose_extcontrol() in the file uipc_syscalls.c was added. The bug was therefore introduced because the function lacks a call to the macro fdrop(). So the natural question is: What is the purpose of this macro?

fdrop() has two arguments, fp and td. The latter is just a kernel-pointer to the current thread. The former is a pointer to a struct file object, which is defined in line 170 of sys/sys/file.h.

170  struct file {
171  	void		*f_data;	/* file descriptor specific data */
172  	struct fileops	*f_ops;		/* File operations */
173  	struct ucred	*f_cred;	/* associated credentials. */
174  	struct vnode 	*f_vnode;	/* NULL or applicable vnode */
175  	short		f_type;		/* descriptor type */
176  	short		f_vnread_flags; /* (f) Sleep lock for f_offset */
177  	volatile u_int	f_flag;		/* see fcntl.h */
178  	volatile u_int 	f_count;	/* reference count */
179  	/*
180  	 *  DTYPE_VNODE specific fields.
181  	 */
182  	int		f_seqcount;	/* (a) Count of sequential accesses. */
183  	off_t		f_nextoff;	/* next expected read/write offset. */
184  	union {
185  		struct cdev_privdata *fvn_cdevpriv;
186  					/* (d) Private data for the cdev. */
187  		struct fadvise_info *fvn_advice;
188  	} f_vnun;
189  	/*
190  	 *  DFLAG_SEEKABLE specific fields
191  	 */
192  	off_t		f_offset;
193  	/*
194  	 * Mandatory Access control information.
195  	 */
196  	void		*f_label;	/* Place-holder for MAC label. */
197  };

fdrop() itself is a macro, which first calls refcount_release(). This function atomically decrements f_count in line 178 of the struct definition by 1 and returns 1 if f_count was less or equal to 1 before the function was called, or 0 otherwise. If the return value was 1 indeed, the macro calls _fdrop() with all its arguments.

_fdrop() itself is defined in sys/kern/kern_descrip.c.

2943  int __noinline
2944  _fdrop(struct file *fp, struct thread *td)
2945  {
2946  	int error;
2947  
2948  	if (fp->f_count != 0)
2949  		panic("fdrop: count %d", fp->f_count);
2950  	error = fo_close(fp, td);
2951  	atomic_subtract_int(&openfiles, 1);
2952  	crfree(fp->f_cred);
2953  	free(fp->f_advice, M_FADVISE);
2954  	uma_zfree(file_zone, fp);
2955  
2956  	return (error);
2957  }

The interesting line is 2954. Here, uma_zfree() is called. This is an internal kernel function which frees an allocated chunk on the heap. It is beyond the scope of this document to discuss the details of the kernel’s heap management. The inner workings of the kernel allocator are described by argp and karl in Phrack #0x42, Phile #0x08. Another great resource is the book “The Design and Implementation of the FreeBSD Operating System” by McKusick et al.

For the purpose of this blog post, the following knowledge about the kernel heap is sufficient: The FreeBSD kernel’s heap allocator allows to define “zones”. Each zone is used to manage page chunks of a specific size by creating “buckets”. To allocate a chunk on the kernel’s heap, uma_zalloc() has to be called with parameter which specifies the zone. The function returns a pointer to a chunk which is taken from a bucket of that zone. If the bucket is empty, a new page is allocated for that zone and chopped into the zone’s chunk size.

For example the zone socket is used to allocate chunks of size 872 Bytes which are used by the kernel to allocate heap space for socket objects. There are anonymous zones, too, like 256 which are used by calls to malloc() in the kernel.

It is possible to view all available zones including their stats with the command vmstat -z.

When a heap chunk is freed via uma_zalloc() by the kernel, it will be put back into the zone’s bucket. Subsequent calls to malloc() or uma_zalloc() will attempt to take chunks from such buckets in a LIFO fashion.

For the moment, the interesting insights are:

uma_zfree() is called with file_zone
file_zone refers to a special zone called Files for the struct file type
If another function allocates a struct file from Files, it will eventually receive the pointer that was freed by _fdrop()

A last observation is the following: fdclose() does call fdrop() in line 2384 of its implementation in sys/kern/kern_descrip.c, too.

2376  fdclose(struct thread *td, struct file *fp, int idx)
2377  {
2378  	struct filedesc *fdp = td->td_proc->p_fd;
2379  
2380  	FILEDESC_XLOCK(fdp);
2381  	if (fdp->fd_ofiles[idx].fde_file == fp) {
2382  		fdfree(fdp, idx);
2383  		FILEDESC_XUNLOCK(fdp);
2384  		fdrop(fp, td);
2385  	} else
2386  		FILEDESC_XUNLOCK(fdp);
2387  }

As the description of the bug mentions the wrapping of a reference counter, one can see that the lack of a second fdrop() leads to an overflow of the reference counter f_count in respective struct file. This could ultimately lead to a use-after-free vulnerability¹.

To clarify this, we’ll investigate the exact purpose of m_dispose_extcontrolm() and the vulnerable path.

The Vulnerable Path

m_dispose_extcontrolm() was introduced for a single purpose as is described here: It is possible to send file descriptors via a UNIX-domain socket. This is done by the function sendmsg(), which allows to put so-called control data into the message². File descriptors are sent by using the control message type SCM_RIGHTS.

However, it was observed quite a while ago that file descriptors do leak if the receiver’s buffer is too small. The bug entry’s comments state that in the case of a not big enough receiving buffer, the receiving process has to close the already opened file descriptors, due to the fact that not all descriptors are received. However, as the already received file descriptors are processed and the references to the file objects are created (as mentioned in the introduction), these file descriptors are leaked if the close does not happen.

Therefore, the new function m_dispose_extcontrol() was introduced to resolve this issue. Reviewing line 998 ff. of kern_recvit() in sys/kern/uipc_syscalls.c shows the call in line 1033.

902  int
903  kern_recvit(struct thread *td, int s, struct msghdr *mp, enum uio_seg fromseg,
904      struct mbuf **controlp)
905  {
...
998  	if (mp->msg_control && controlp == NULL) {
999  #ifdef COMPAT_OLDSOCK
...
1018  #endif
1019  		ctlbuf = mp->msg_control;
1020  		len = mp->msg_controllen;
1021  		mp->msg_controllen = 0;
1022  		for (m = control; m != NULL && len >= m->m_len; m = m->m_next) {
1023  			if ((error = copyout(mtod(m, caddr_t), ctlbuf,
1024  			    m->m_len)) != 0)
1025  				goto out;
1026  
1027  			ctlbuf += m->m_len;
1028  			len -= m->m_len;
1029  			mp->msg_controllen += m->m_len;
1030  		}
1031  		if (m != NULL) {
1032  			mp->msg_flags |= MSG_CTRUNC;
1033  			m_dispose_extcontrolm(m);
1034  		}
1035  	}

Ultimately, this path is followed if a message containing file descriptors in the control part is sent, but the recvmsg() used to receive this message does not expect any control messages at all. This will trigger m_dispose_controlm(), as len in line 1020 will be 0 but m->m_len will be bigger. Hence, the loop in line 1022 is not executed but m is assigned to something different from NULL because there is a control message in the sent message.

That is, to call m_dispose_constrolm(), the following has to be done:

Create a UNIX-domain socket
Allocate a buffer which is big enough to send a message via the socket with a contained file descriptor
Allocate a buffer which is not big enough to receive a message via the socket which contains a file descriptor
Send the message with sendmsg()
Receive the message with recvmsg()

Now, the bug itself can be reviewed. For this, line 1606 ff. of m_dispose_extcontrolm() has to be taken into view again.

1578  void
1579  m_dispose_extcontrolm(struct mbuf *m)
1580  {
...
1606  				while (nfd-- > 0) {
1607  					fd = *fds++;
1608  					error = fget(td, fd, &cap_no_rights,
1609  					    &fp);
1610  					if (error == 0)
1611  						fdclose(td, fp, fd);
1612  				}

The interesting part is the call to fget(). This function eventually results in a call to fget_unlocked(), which extracts the pointer to a struct file object from the file descriptor fd, saves the address in the pointer fp and increments the reference counter f_count of the struct file object by 1. However, sending a file descriptor already increases the reference counter because the receiver holds with its own file descriptor a reference to the same struct file object. In total, the reference counter is therefore increased by 2.

In the end, fdclose() in line 1611 closes the receiver’s file descriptor and therefore removes a reference to the struct, but does not take into account that f_count was increased by 2 in total due to the call to fget().

Therefore, the function results in a primitive that can increase f_count by 1. Because f_count is of the type u_int, which has a size of 32 bit, an overflow of this variable is possible in a reasonable time. This is done by just iterating the primitive to increase f_count until the result would need more than 32 bit to represent it in memory. Therefore all bits higher than the 32th bit are removed and the variable wraps back to 0 again.

If this is used to overflow f_count to 1, then it is possible to create a use-after-free scenario. To achieve this, the sending process needs to hold two file descriptors to the same struct file, which can be done by duplicating one file descriptor with dup(). Now, if one of the file descriptors is closed after the overflow, fdrop() is called. As this call will only see that f_count is 1, the struct file object will be freed as explained above. However, it is still possible to reference the freed chunk via the second, still opened file descriptor.

Now that the general idea is clarified and because this part is so important, it should be described in more detail how to trigger the vulnerability as there are some wrinkles to address.

First, a file is opened with open() and the file descriptor is duplicated with dup(). This results in two file descriptors, which reference the same struct file object. Hence, f_count is 2.

Next, m_dispose_extcontrolm() is triggered, as described above, repeatedly to overflow f_count to finally set it to 1 again. This has some quirks, though. On the one hand, it is not possible send an arbitrary amount of file descriptors. This is the case because each file descriptor will need some space in the message to send but the kernel internally only has a limited amount of so-called mbufs available, which build up the message for further processing.

Luckily enough, sending the file descriptor more than once via the socket will increase the reference counter multiple times before m_dispose_extcontrolm() is called to decrement the counter.

Due to this, the preparation of the use-after-free needs some time. For a virtual machines used for this research, this was around 20 minutes³.

Moreover, it is important to note that the overflow has to result in f_count == 1 and not 0, because if fdrop() is called with f_count == 0 it will result in a kernel panic. This is because refcount_release() only checks if the decremented counter is 0 or less to signal the free but _fdrop() asserts that f_count is exactly 0. If the counter is negative, _fdrop() triggers the panic⁴.

After wrapping f_count to the value 1, one of the file descriptors is “manually” closed. This will trigger the free because f_count will decrement to 0, thus _fdrop() is called. The result is that the struct file object is released to the Files zone’s bucket. Due to the former dup() the second file descriptor still references the freed struct file object but is now invalid as the object itself is marked as invalid.

Here is another quirk: Other syscalls like read() or write() which use a struct file will result in a call to fget_unlocked(). This function will check that the used file descriptor is not greater than the greatest valid one, though. Therefore, the first file descriptor has to be closed because the duplicated one is typically greater.

So far we have obtained a file descriptor that references an invalid struct file. Once the memory of this struct file is re-used by another call to open(), our “dangling” file descriptor will actually become valid again. However, it will then point to the newly opened file. The dangling file descriptor can be used for the same operations as the newly opened one. Because they share the same struct file all rights of the newly opened file descriptor are inherited by the dangling one. E.g., if the first file was opened read-only but the second one writable, the dangling file descriptor can be used to write to the newly opened file.

How this could be exploited will be described in the next section and the one after that.

To sum the trigger strategy up:

Open a file with open()
Duplicate the file descriptor with dup()
Overflow f_count to 1 by calling m_dispose_extcontrol()
Close the first file descriptor to trigger the free
Call open() again to allocate another struct file from the Files zone, resulting in a dangling file descriptor to that object

A first proof-of-concept can be found in trigger_uaf.c. It is notable that it will result in a kernel panic when the program is closed. This will be addressed in the final exploit. The interested reader can try to figure out the solution before continuing the read.

Exploitation Strategies

This section will discuss three possible exploitation strategies, two of which failed during the research.

Utilizing a suid Program

One of the simplest strategies is the following: Is it possible to trigger the use-after-free and executing some suid program like passwd after that, resulting into a dangling file descriptor to a file owned by root including all capabilities?

An exploit would need to place the struct file object exactly into the Files zone bucket where needed. If the suid program opens a file owned by root like master.passwd or libmap.conf writable, it could be possible to write to this file from the user context.

In theory this strategy works. A proof-of-concept can be found in the files setuid_test_client.c and setuid_test_server.c (NB: The compiled setuid_test_server.c has to be root-owned and suid, of course).

However, finding such a program turned out to be trickier than expected. Most utilities in the standard installation open interesting files read-only or close them too fast.

Therefore this approach was dropped early.

Memory Corruption

Another typical approach would be to find some way for corrupting the kernel’s memory in order to simply execute user-provided code. This could mean to overwrite, e.g., a function pointer inside the struct file object or another object which could be indirectly referenced by the struct file object.

Indeed, fail0verflow used a very similar technique to exploit a vulnerability in the PlayStation 4 operating system, which is based on FreeBSD 9. They opened a kqueue, which is a FreeBSD mechanism to monitor kernel events.

kqueue-files utilize the f_data pointer of struct file to manage the kqueue. The heap space is allocated from one of the anonymous zones. Therefore it is possible to spray into that zone via ioctl() and overwrite a function pointer as described in the blog post.

There is a significant difference between the vulnerability used by fail0verflow and the one researched in this work, though: The former allowed an arbitrary free, e.g., of the f_data pointer. The latter only allows to free a struct file. As the kernel cleans up all interesting struct fields and pointers, it is not possible to use fail0verflow’s primitives.

During the research all other possible objects which rely on struct file were investigated, but none allowed to exploit the vulnerability via a memory corruption due to the just described reason.

Exchange the File Object During write()

This is a variation of our first strategy.

The FreeBSD kernel checks if a file is opened writable as soon as a user tries to write via a file descriptor. If the check if passed, the write operation is prepared and executed. This creates basically a Time-of-Check-Time-of-Use (TOCTOU) scenario.

The idea is the following: The time window between the passed check and the write operation itself creates a race condition. It should be possible to free the struct file object via the described vulnerability and immediately open another file which is normally read-only for the user. Because the write-check is already done, the write operation will be executed but happen on the read-only file because the struct file was exchanged.

As this strategy was finally successfully used to exploit the use-after-free to gain root access, it is described in detail in the next section.

It should be mentioned that Jann Horn of Google’s Project Zero exploited a similar vulnerability in Linux and used a similar approach. This is described in a posting from 2016 on the Project Zero bug tracker.

Escalate to root

As described in the last section, the strategy is basically a Time-of-Check-Time-of-Use (TOCTOU) attack: write() is called for a file which is writable by the user. The syscall will first check if the file referenced by the file descriptor is indeed writable by the user or result in an error otherwise. After the check is passed, the use-after-free vulnerability is triggered and a root-owned read-only file is opened by the user right after that.

The check works the following way: the write() syscall results in a call of the kernel function sys_write() in sys/kern/sys_generic.c. Subsequently, kern_writev() in the same file is called, which then calls fget_write(). The latter is defined in sys/kern/kern_descrip.c. This function is used to retrieve the struct file object for the file descriptor and checks for write capability.

To do this, the function calls _fget() in the same file.

2716  static __inline int
2717  _fget(struct thread *td, int fd, struct file **fpp, int flags,
2718      cap_rights_t *needrightsp, seq_t *seqp)
2719  {
2720  	struct filedesc *fdp;
2721  	struct file *fp;
2722  	int error;
2723  
2724  	*fpp = NULL;
2725  	fdp = td->td_proc->p_fd;
2726  	error = fget_unlocked(fdp, fd, needrightsp, &fp, seqp);
...
2738  	switch (flags) {
2739  	case FREAD:
2740  	case FWRITE:
2741  		if ((fp->f_flag & flags) == 0)
2742  			error = EBADF;
2743  		break;
...
2753  	}
2754  
2755  	if (error != 0) {
2756  		fdrop(fp, td);
2757  		return (error);
2758  	}
2759  
2760  	*fpp = fp;
2761  	return (0);
2762  }

The check can be seen in line 2741. f_flag is a field in struct file, set by the first call to open() for the file. A call to write() will result in a check for the FWRITE bit set. Therefore, the bit is only set if the user has write privileges for the opened file.

The following should be noted: If it is assumed that the use-after-free is already prepared at that moment, that is, the f_count variable was overflowed to 1, the call to fget_unlocked() does increase f_count to 2. The counter will be decreased to 1 again at the end of the write() operation.

However, if the check is successfully passed, it is possible to exchange the file object via the use-after-free plus another call to open() to a read-only file. The write will then happen to the second file. A first proof-of-concept can be found in test_rd_only_write.c (note that a kernel patch is needed for that test as noted in the comment at the beginning of the file; see the appendix for more information).

After the check is passed, kern_writev() will call dofilewrite() in the same file.

545  static int
546  dofilewrite(struct thread *td, int fd, struct file *fp, struct uio *auio,
547      off_t offset, int flags)
548  {
...
564  	if (fp->f_type == DTYPE_VNODE &&
565  	    (fp->f_vnread_flags & FDEVFS_VNODE) == 0)
566  		bwillwrite();
567  	if ((error = fo_write(fp, auio, td->td_ucred, flags, td))) {
...
576  		}
577  	}
...
587  }

This function has two interesting function calls. This first one is to fo_write() in line 567. This function will eventually result in the write operation. That is, it is safe to assume that the file exchange has to happen before this function is called⁵.

This race is quite tight, though, and not likely to win without any other primitive. Moreover, the race has to be won at the first try because otherwise the kernel will panic.

This is the case because it is assumed, as mentioned above, that f_count is 2 at the moment of the passed check. Therefore, close() has to be called twice. But if the check is not yet passed or write() has already finished, f_count is still 1. _fdrop() will assert that the reference counter is exactly 0, though, but two closes would result in a negative reference counter because the function interprets the counter as a signed integer. The panic would be the result of the failed assertion.

Jann Horn could delay the Linux kernel operations by writing a FUSE file system which delays write operations to increase the success rate to win this race condition. This is not possible in a standard FreeBSD installation because FUSE cannot be assumed to be loaded (and to be available to the respective user).

Here, the call to bwillwrite() in line 566 comes to help. This function and the function buf_dirty_count_severe() are defined in sys/kern/vfs_bio.c.

2564  bwillwrite(void)
2565  {
2566  
2567  	if (buf_dirty_count_severe()) {
2568  		mtx_lock(&bdirtylock);
2569  		while (buf_dirty_count_severe()) {
2570  			bdirtywait = 1;
2571  			msleep(&bdirtywait, &bdirtylock, (PRIBIO + 4),
2572  			    "flswai", 0);
2573  		}
2574  		mtx_unlock(&bdirtylock);
2575  	}
2576  }
2577  
2578  /*
2579   * Return true if we have too many dirty buffers.
2580   */
2581  int
2582  buf_dirty_count_severe(void)
2583  {
2584  
2585  	return (!BIT_EMPTY(BUF_DOMAINS, &bdhidirty));
2586  }

The line that directly strikes the eye is line 2571, which calls msleep(). As can be read in the man page of that function⁶, this will put the thread to sleep without a timeout. A check of the wakeup channel bdirtywait shows that the thread will be woken up by bdirtywakeup(), which itself is called by buf_daemon() from vfs_bio.c.

The condition for the call to msleep() and the purpose of buf_daemon() are connected: If the kernel holds too many dirty buffers (i.e., buffers that are waiting for their write operation), buf_dirty_count_severe() returns 1 and buf_daemon() is woken up to flush these in order to reduce the count. After that, bdirtywakeup() is called to wake up all write operations that are waiting at line 2571.

There are two kernel variables used as a water mark to monitor if there are two many dirty buffers: lodirtybuffers and hidirtybuffers. These are set at boot time and depend on the available RAM⁷. If the number of dirty buffers is greater than hidirtybuffers, the bit bdhidirty (line 2585) is set and buf_dirty_count_severe returns 1, thus resulting in the msleep() call.

Therefore a technique is needed to raise the number of dirty buffers fast. After some experimenting, an easy way was quickly identified: The exploit needs to open a vast amount of file streams, which all reference to the same file. However, after a file stream is opened with fopen(), the corresponding file is unlinked before the next call to fopen(). The number of dirty buffers will increase if the exploit attempts to write to all these file streams in parallel.

A demo for this technique can be found in test_dirty.c.

The exploit now works as follows: Once the number of dirty buffers is high enough, another thread gets a signal to attempt the write to a previously opened, random file writable by the user. Simultaneously, a signal to yet another thread is given to trigger the use-after-free and open a file that is read-only for the user. The use-after-free is triggered by closing both, the file descriptor used to prepare the user-after-free scenario and the duplicated one (remember that f_count equals 2 now). Opening the read-only file now will result into the exchange of the struct file object.

To increase the chance that the race is won by the exploit, the use-after-free trigger thread and the write-thread run on different cores, if possible, and the use-after-free is delayed by some microseconds. If the latter is not done, the exploit will likely result in a kernel panic because the use-after-free happens too fast and thus resulting in the panic triggered by _fdrop() as described above.

A proof-of-concept for the arbitrary write can be found in arbitary_file_write. It should be noted that the hammer threads are synchronized. Otherwise the resulting load of “hammering”, still creating threads and triggering the vulnerability renders an unreliable exploit. Moreover, a synchronous write triggers the msleep() condition much faster.

There are still two challenges open for the final exploit: Gaining root and preventing the kernel panic.

The latter can be achieved by using the primitive which resulted in the reference counter overflow. The reason for the panic is, that the program attempts to close all file descriptors it assumes open. Due to the use-after-free the reference counter is to small, though. Hence, the former mentioned assertion in _fdrop() fails.

Because f_count of the exploited struct file is 1 at the time of the write operation due to the use-after-free and the following open(), write() will free the file object again because it calls fdrop() at the end. It is therefore needed, to call open() a third time, to get a file descriptor to the corrupt struct file. After that, the reference counter increase primitive can be used to increase f_count, which will prevent the kernel panic.

The privilege escalation is now a piece of cake thanks to a technique used by kingcope, who published a FreeBSD root exploit in 2005, which writes to the file /etc/libmap.conf. This configuration file can be used to hook the loading of dynamic libraries if a program is started. The exploit therefore creates a dynamic library, which copies /bin/sh to another file and sets the suid-bit for the copy. The hooked library is libutil, which is for instance called by su. Therefore, a call to su by the user will afterwards result in a suid copy of /bin/sh.

The final exploit can be found in heavy_cyber_weapon.sh⁸.

Conclusion and Further Steps

The past blog post showed the exploitation of a simple yet easy to overlook vulnerability in FreeBSD.

While the use-after-free trigger was quite simple to develop, the research to find a way to exploit the vulnerability needed a lot more effort. The main reason for this seems to be the reasonable good engineering of the kernel code and the scarce landscape of write-ups for FreeBSD vulnerabilities. The author hopes that this blog post does contribute well to the latter or does help at least another neighbor during their own research.

To the best knowledge of the author, this blog post is the first to describe a working exploit for this vulnerability and the first that describes the needed technique for that.

The exploitation technique itself should come handy in similar situations, which trigger a use-after-free on the Files zone as well. Without major changes on the kernel code, it should remain.

Moreover, the elegance of logic exploits comes to shine, when using this exploit without modifications on other installations and even other CPU architectures. E.g., the exploit was successfully tested on an ARM processor.

At the end, some further steps should be mentioned:

At the moment the exploit technique only works for UFS. While this was the standard file system in the past, nowadays ZFS is widely adopted for FreeBSD installations. However, the technique does not work with ZFS, because the dirty buffer mechanism seems to work different (if it exists at all). Therefore, msleep() is not triggered, which renders the race condition unreliable. The chances are high that another delay mechanism has to be found.
The code to render the race condition reliable via doing a lot of parallel writes to create dirty buffers seems inelegant. Maybe there is another way to create a write delay but this was the quickest way to do it.
Last but not least, there are maybe completely different ways to exploit the vulnerability. For example, one could think about tricking a suid program to read from a file it does not intend to read from (such as, tricking su into reading from a user-provided file instead of from /etc/pam.d/su).

Appendix: Test Setup

To test and debug the exploit and bugs, a setup with VirtualBox was chosen. Moreover, some kernel patches can be applied to accelerate the preparation of the use-after-free condition.

VirtualBox Setup

There are already some guides to create a test setup for FreeBSD kernel debugging, e.g., by argp.

This is a current update for these guides and explains the setup step-by-step.

Get the FreeBSD-12-RELEASE disc1 from here
Install everything into a new VirtualBox VM

Use UFS
Do not apply any hardening for the moment
Install sources (/usr/src), SSH and debug features

Reboot after installation, do not forget to ‘eject’ the disc image
Configure SSH, users etc.
Install gdb with pkg: pkg install gdb
Compile a custom kernel the following way

cd /usr/src/sys/amd64/conf
Create a new file called DEBUG and add the following contents

include		GENERIC
ident		DEBUG

makeoptions	DEBUG=-g

options		DDB
options		GDB
options		KDB

Create /etc/make.conf with CFLAGS=-pipe -O0 for the content
Change all -O2 to -O0 in /usr/src/sys/conf/kern.pre.mk
Execute cd /usr/src
Execute make buildkernel -j8 KERNCONF=DEBUG
Execute make installkernel KERNCONF=DEBUG
Execute reboot

Execute sysctl debug.kdb.enter=1 to check if it is possible to enter debug mode

Note that the debugger starts in the VM window and not the SSH session

Clone the machine, use expert mode to create a linked clone and just leave the rest
The clone is the target, while the original is the debugger host
For the target, activate in the VM settings the serial port COM1 as a host pipe

Do not check the box for the connection to a pipe
Path could be /tmp/fbsdpipe

Do the same for the debugger

Check the box for the connection to a pipe, use the same path as for the target

Boot the target first
Change the hostname if wanted
Change hint.uart.0.flags to 0x90 in /boot/device.hints
Reboot the target
Boot the debugger
Change to debug mode in the target and execute gdb in the debugger session
In the debugger execute:
- cd /usr/obj/usr/src/amd64.amd64/sys/DEBUG
- kgdb -b 38400 kernel.debug
- kgdb> target remote /dev/cuau0
A debugger session for the target in kgdb should be seen

Kernel Patches

To ease the exploit testing and to prevent to wait a long time span for each run, a patch in m_dispose_extcontrolm() is a possible solution. The following line needs to be added to the end of the function:

1618    }
1619    if (fp->f_count == 1234) fp->f_count = 0xfffffff2;
1620  }

This approach also needs to replace the for-loop in all code files which prepares the use-after-free with the following one (basically reducing the upper limit for i and just sending one file descriptor at a time):

for (i = 0; i < 1232; i++)
	send_recv(fd, sv, 0x1);

During the research, another kernel patch was used to create a very long delay after bwillwrite() was called. Add the following if-condition after the function call in sys/kern/sys_generic.c:

564  if (fp->f_type == DTYPE_VNODE &&
565      (fp->f_vnread_flags & FDEVFS_VNODE) == 0) {
566      bwillwrite();
567      if (fd == 16000)
568          pause("", 100);
569  }

This needs to alter the call to open() in order to open a temporary file in all code files to the following:

do {
  if ((fd = open_tmp()) == -1) {
    perror("open_tmp");
    exit(1);
  }
} while(fd != 16000);

The last kernel patch used is needed for test_rd_only_write.c. In sys/kern/sys_generic. the following patch is needed in the function kern_writev() after fget_write() is called:

491  if (fd == 22)
492    __asm__("int3");

This results in a break point to allow the exchange of the struct file object as described in the top comment of that proof-of-concept.

See, e.g., https://ruxcon.org.au/assets/2016/slides/ruxcon2016-Vitaly.pdf by Vitaly Nikolenko for an example of similar bugs↩
Refer to man page unix(4) for details↩
With 2 GB of RAM and 2 cores of a modern Intel Notebook CPU.↩
One could use this behaviour as a first proof of concept by increasing the reference counter to 0xffffffff = -1 (two’s complement…) because before the decrement to 0xffffffff, f_count will be 0.↩
Or latter in the subsequent function calls inside fo_write() but this is not needed.↩
sleep(9)↩
Call sysctl vfs.hidirtybuffers and sysctl vfs.lodirtybuffers to show the value for the specific system.↩
Constant numbers of forks, threads and files may need adjustment for other test systems. The numbers presented in this exploit are for a VM with 2 GB RAM and 2 cores on a modern Intel notebook CPU.↩