More on FreeBSD Refcount Overflows

Posted on August 30, 2019 by K³


In the last blogpost it was shown how a reference counter overflow of file objects inside the FreeBSD kernel space could be exploited in order to gain an arbitary file write primitive to escalate privileges to root. After the blogpost, the FreeBSD project issued a couple of advisories and patches that fixed similar vulnerabilities.

This blog post catches up on the research and provides a trigger for CVE-2019-5603, which is described in FreeBSD-SA-19:15.mqueuefs - a reference counter overflow in the mqueuefs subsystem.

This trigger can be directly plugged into the exploit provided with the previous blog post in order to gain root privileges on an unpatched FreeBSD system.

Moreover, this blogpost will give a short note on how the research led to the discovery of an unpatched vulnerability in the FreeBSD mqueuefs subsystem. The vulnerability has been reported to (and fixed by) the FreeBSD project before publishing this blogpost.

Last but not least, a discussion of mitigations for reference counter overflows in future FreeBSD releases is conducted.

For those who are not familiar with the topic, there is a short recap in the next section.


The last blog post contained a discussion of the full exploitation path to elevate privileges to root by exploiting the vulnerability described in FreeBSD-SA-19:02.fd.

The vulnerability was a reference counter overflow. Every file object, represented in the kernel by a variable of the type struct file, has a reference counter field f_count that is of type uint32_t. This counter describes how many file descriptors reference that particular file object. If all file descriptors are closed for that file, the object is freed in kernel space.

A free() in the context of the FreeBSD kernel means that a pointer to that heap chunk is put into the bucket of a heap zone as shown in argp’s and karl’s Phrack article from 2009. If a new file is opened, the kernel will first allocate file objects from that bucket in a LIFO style.

Using the reference counter overflow, it was possible to wrap the reference counter to the value 1 while a userspace application had more than one file descriptor open to the same file. Closing one of these file descriptors led to freeing the file object in kernel space while its heap chunk was still referenced by an open file descriptor.

Due to the LIFO behaviour of the kernel heap allocator, the next call to open() would eventually allocate the just freed heap chunk for the file object. Because the unclosed file descriptor still referenced that heap chunk it was possible to interact via both file descriptors with the newly opened file and not only via the one returned by open().

The exploit then leveraged a race condition in the write() syscall. After ensuring that the file descriptor was opened with write-privileges, a small race window exists between the successful pass of that check and the actual write operation. It is possible to extend this window by creating a lot of dirty buffers in the file system. The result is that the write() syscall will defer the actual write operation until these dirty buffers are removed.

This leaves enough time to trigger a free() on the file object and then open a file that the attacking user has only read-access to. Given a proper timing, the write operation will happen into this read-only file. By writing to /etc/libmap.conf it is possible to exploit this behaviour to gain root privileges.

Triggering the Vulnerability in mqueuefs

While the technique to exploit such a reference counter overflow to gain root privileges is quite universal, the trigger to create an exploitable situation is of course different from vulnerability to vulnerability.

FreeBSD-SA-19:15.mqueuefs describes that such a vulnerability existed in the mqueuefs subsystem. While this attack vector is not as severe as the one from FreeBSD-SA-19:02.fd because mqueuefs is not loaded in a default installation of FreeBSD, it was still interesting to check whether the developed exploit technique is really that universal.

One vulnerable function1 was sys_kmq_timedsend() in sys/kern/uipc_mqueue.c:

2289 	int
2290 	sys_kmq_timedsend(struct thread *td, struct kmq_timedsend_args *uap)
2291 	{
2292 	        struct mqueue *mq;
2293 	        struct file *fp;
2294 	        struct timespec *abs_timeout, ets;
2295 	        int error, waitok;
2297 	        AUDIT_ARG_FD(uap->mqd);
2298 	        error = getmq_write(td, uap->mqd, &fp, NULL, &mq);
2299 	        if (error)
2300 	                return (error);
2301 	        if (uap->abs_timeout != NULL) {
2302 	                error = copyin(uap->abs_timeout, &ets, sizeof(ets));
2303 	                if (error != 0)
2304 	                        return (error);
2311 	        fdrop(fp, td);
2312 	        return (error);
2313 	}

The vulnerability is present in line 2304. The call to getmq_write() in line 2298 will eventually result in a call to fget_unlocked() in sys/kern/kern_descript.c. fget_unlocked() will increase the reference counter f_count by one because fp will reference the file object until the end of the function call. However, if copyin() fails in line 2302, the syscall will return in line 2304 without releasing the reference in fp. Hence, f_count is not decreased; this would have been the responsibility of fdrop() in line 2311.

Therefore, if it is possible reach line 2302 but let copyin() fail, it should be possible to trigger the reference counter increase and thus eventually a reference counter overflow.

The patch of the vulnerability seconds that explanation:

--- releng/12.0/sys/kern/uipc_mqueue.c	2018/11/27 17:58:25	341085
+++ releng/12.0/sys/kern/uipc_mqueue.c	2019/07/24 12:55:16	350284
@@ -2301,13 +2302,14 @@
 	if (uap->abs_timeout != NULL) {
 		error = copyin(uap->abs_timeout, &ets, sizeof(ets));
 		if (error != 0)
-			return (error);
+			goto out;
 		abs_timeout = &ets;
 	} else
 		abs_timeout = NULL;
 	waitok = !(fp->f_flag & O_NONBLOCK);
 	error = mqueue_send(mq, uap->msg_ptr, uap->msg_len,
 		uap->msg_prio, waitok, abs_timeout);
 	fdrop(fp, td);
 	return (error);

It is quite easy to let copyin() fail. The man page2 of copyin() states that this function can be used to copy data from user space to kernel space. In the present call in line 2302, uap->abs_timeout has to provide a valid user space address while &ets has to provide a valid kernel address.

Moreover, the man page3 of mq_timedsend(), which is the userspace caller for the vulnerable syscall, shows that the function expects a pointer parameter abs_timeout. The syscall can access its parameters via the struct uap. Therefore, in line 2302 uap->abs_timeout references the userspace address that is provided by the abs_timeout parameter.

Last but not least the man page of copyin() states that copyin() will fail if a bad address is encountered. For example, the address 0x1 will result in such an error.

Indeed, he following call will trigger the reference counter increase:

mq_timedsend(fd, NULL, 0, 0, (const struct timespec *)0x1);

fd is a mqd_t file descriptor, which is a special file type from the librt-library.

The only obstacle that occurs is that the increase of f_count with each call of that vulnerable path is 1. But as mentioned in the former blog post, fget_unlocked() will result in an infinite loop once a wrap from a value of 0xffffffff to 0 happens. There is an easy solution, though. Simply calling dup() will increase f_count by 1 because the new file descriptor creates a reference to the file object in the kernel. The increase does not happen via fget_unlocked() but via fhold(). And this will happen without any further check.

Hence, the exploit needs to call the mq_timedsend() function 0xfffffffe times to increase f_count to 0xffffffff. Two subsequent calls to dup() can then be used to create two more file descriptors that are needed later in the exploit and to wrap f_count to 1.

However, the exploit should write to a regular file but the file descriptors reference a special “mqueue file”. A property of FreeBSD and other Unix-like operating systems comes here to help: The FreeBSD kernel uses the struct file type to describe any file and not only regular ones. For example, it is used for device files or sockets, too. Because any of these is described by a struct file, the heap chunk for the file object is always allocated from the same kernel heap zone.

Moreover, file descriptors are agnostic about the file type that they reference.

Hence, the exploit must close one of the file descriptors in an additional step to free the file object. If a regular file is opened next, the file object will use the heap chunk of the “mqueue file” to store the file information. Because the other two file descriptors reference this heap chunk, too, the result are three file decriptors that reference a regular file.

By substituting the function prepare() in heavy_cyber_weapon.c from the old exploit with the just described technique, the exact same situation is created that is needed to gain root privileges with the TOCTOU technique from the former blogpost4.

The final exploit can be found in mqueuefs_exploit.tbz.

The Mentioned 0-Day

While researching the above n-day, it was observed that the published patch missed one vulnerable path.

Many FreeBSD syscalls have a “sibling” syscall for compability with 32-bit applications on 64-bit installations. The one for sys_kmq_timedsend() is called freebsd32_kmq_timedsend(), which is defined in the same file.

The following shows this compability syscall from the SVN revision 3502615:

2783 	int
2784 	freebsd32_kmq_timedsend(struct thread *td,
2785 	    struct freebsd32_kmq_timedsend_args *uap)
2786 	{
2787 	        struct mqueue *mq;
2788 	        struct file *fp;
2789 	        struct timespec32 ets32;
2790 	        struct timespec *abs_timeout, ets;
2791 	        int error;
2792 	        int waitok;
2794 	        AUDIT_ARG_FD(uap->mqd);
2795 	        error = getmq_write(td, uap->mqd, &fp, NULL, &mq);
2796 	        if (error)
2797 	                return (error);
2798 	        if (uap->abs_timeout != NULL) {
2799 	                error = copyin(uap->abs_timeout, &ets32, sizeof(ets32));
2800 	                if (error != 0)
2801 	                        return (error);
2810 	        fdrop(fp, td);
2811 	        return (error);
2812 	}

It is clearly visible by comparison with the vulnerable revision of sys_kmq_timedsend() that the patch was just not applied.

Therefore, on a system with the patch applied, it is still possible to exploit the vulnerability on 64-bit installations. All that has to be done is to compile the exploit with the -m32 option for cc. If the 32-bit libraries are not installed on the target, it is still possible to compile the exploit with the -static option on similar FreeBSD system that has these installed and copy the resulting binary to the target.

The vulnerability was patched with FreeBSD-SA-19:24.mqueuefs.

Future Mitigations

With revision 350199 a mitigation for reference counter overflows in file objects was pushed to FreeBSD-HEAD. Thus, it will be published in a future release of FreeBSD, probably FreeBSD 13.0.

The mitigation introduces a new function in sys/sys/refcount.h, called refcount_acquire_checked():

static __inline __result_use_check bool
refcount_acquire_checked(volatile u_int *count)
	u_int lcount;

	for (lcount = *count;;) {
		if (__predict_false(lcount + 1 < lcount))
			return (false);
		if (__predict_true(atomic_fcmpset_int(count, &lcount,
		    lcount + 1) == 1))
			return (true);

refcount_acquire_checked() will first ensure that an increase of the reference counter does not overflow the counter (which would decrease the counter effectively). After that, an attempt to increase the reference counter with atomic_fcmpset_int() is executed.

For the x64 architecture, this function is defined in usr/src/sys/amd64/include/atomic.h:

181  #define	ATOMIC_CMPSET(TYPE)				\
200  static __inline int					\
201  atomic_fcmpset_##TYPE(volatile u_##TYPE *dst, u_##TYPE *expect, u_##TYPE src) \
202  {							\
203  	u_char res;					\
204  							\
205  	__asm __volatile(				\
206  	"	" MPLOCKED "		"		\
207  	"	cmpxchg %3,%1 ;		"		\
208  	"	sete	%0 ;		"		\
209  	"# atomic_fcmpset_" #TYPE "	"		\
210  	: "=q" (res),			/* 0 */		\
211  	  "+m" (*dst),			/* 1 */		\
212  	  "+a" (*expect)		/* 2 */		\
213  	: "r" (src)			/* 3 */		\
214  	: "memory", "cc");				\
215  	return (res);					\
216  }

The increase is done by chmxchg in line 207. %3 will be src resp. lcount + 1 while %1 will be *dst resp. count and therefore the address of the reference counter. *expect is &locked and MPLOCKED is a macro that fills in a lock instruction.

This achieves the following: The reference counter will be set to lcount + 1 if and only if *count equals lcount. Otherwise the loop in refcount_acquire_checked() is executed again. The lock instruction prefix will ensure exclusive access to the memory region until cmpxchg has finished and prevents a race condition in which many threads attempt to increase the reference counter and *count equals lcount for all of them because the execution of cmpxchg has not finished yet in the other threads.

Therefore, only one increase is done at a time and it is not possible to overflow f_count anymore.

Double decreases of f_count by calling fdrop() two times for one increase are still possible, though, and could be exploited using the same exploit technique shown here and in the former blogpost.

Appendix: Timeline

Timeline for the 0-Day:

  1. More functions than the one discussed were vulnerable but the exploit path is all the same.



  4. Note that during the fork in fire() three calls to close() are needed due to the two calls to dup() here.