This article explores the root causes of the CVE-2015-7547 vulnerability in the glibc library, and the conditions required to exploit it. A successful attack requires three critical elements, but in all scenarios a DNS reply larger than 2048 bytes must be delivered to the victim and a retry condition must be triggered in order for a buffer overflow to occur. Therefore, in the rare cases where immediately patching the glibc library is not a viable option, any of the following mitigation steps are effective (but may have adverse side effects on legitimate DNS traffic):

  • Blocking UDP replies > 2048 bytes and all TCP replies
  • Blocking UDP replies > 2048 bytes and TCP replies > 2048 bytes
  • Configuring the glibc resolver with only one nameserver entry and blocking UDP replies > 1024 bytes
  • Configuring the glibc resolver with only one nameserver entry and enabling the use-vc option to force use of TCP for all queries (glibc 2.14 and later)

Of these, the latter two options are more attractive since they don't impose any length restrictions on replies received over TCP. Thus, they have a much smaller chance of having an adverse impact on legitimate DNS functionality.

The following mitigation steps are not effective in isolation (i.e. when implemented one at a time, not together):

  • Blocking all TCP replies
  • Blocking TCP replies > 2048 bytes
  • Blocking UDP replies > 2048 bytes
  • Blocking UDP replies with the TC flag set
  • Blocking invalid DNS replies
  • Blocking type AAAA replies
  • Disabling EDNS0
Introduction

When the CVE-2015-7547 buffer overflow vulnerability in the DNS stub resolver portion of the GNU C library was first announced, there was a lot of uncertainty about its exploitability under real-world conditions and the impact of some of the mitigations that were being proposed. In this article, we aim to shed more light on these issues by taking a closer look at the actual glibc code. In particular, we outline the critical stages of a successful attack, and the triggering conditions that must be satisfied in each stage.

At the outset, it bears mentioning that for the majority of this article, we assume that the attacker can interact with the victim's glibc resolver directly. This would be the case in a man-in-the-middle scenario where the attacker is intercepting DNS queries from the victim, and replying with specially-crafted responses that are delivered to the victim without any modifications. Such circumstances can arise in the real world (e.g. via malicious wireless access points), but in the majority of cases, there will be one or more caching DNS servers between the victim and the attacker. We discuss DNS cache traversal at the end of the article.

We have published proof-of-concept code demonstrating the attack variants discussed herein.

Cracking the Code

For a quick tour of the relevant code, we'll be looking at the glibc 2.21 release – for no particular reason other than it was the version that happened to be running on our Ubuntu 15.04 workstation at the time the vulnerability was announced.

To narrow down the amount of code we need to analyze, here's the chain of function calls that takes place when an application invokes getaddrinfo() to resolve a host name.

Diagram1

Our focus will be on the last two functions, send_vc() and send_vg(). The former handles DNS queries over TCP, while the latter handles UDP-based queries. Most parameters are passed to these functions by address. This is a recipe for unforeseen side effects to occur. It will be crucial to keep track of which stack frames the parameters reside on as we proceed with our analysis.

Descent into Pointer Hell

The root cause of the vulnerability is the fact that the send_vc() and send_dg() functions can be coerced into leaving their parameters in an inconsistent state – this can be achieved by replying to their queries with appropriately crafted DNS responses. Since the glibc resolver usually sends the queries over UDP first (except if the use-vc option is set in /etc/resolv.conf), let's start with the initial state of the parameters on the stack when send_dg() is invoked for the first time:

Diagram2

Note that most of send_dg()'s parameters are pointing into the stack frame of gethostbyname4_r(), with the notable exception of ansp and anssizp, which are pointing into __libc_res_nsend()'s stack frame instead. These latter two parameters will be the targets of the attacker's manipulation attempts. In the initial state shown in the diagram, ansp indirectly points to a stack-allocated buffer that is 2048 bytes in size, while anssizp points to an integer variable that holds the size of that buffer.

The overall plan is simple. The attacker's objective will be to force a mismatch between these two parameters so that ansp is still pointing to a 2048-byte buffer while anssizp is indicating a much larger buffer size. Then send_dg() or send_vc() will be invoked again with the two parameters in an inconsistent state, and a response larger than 2048 bytes will be sent to the function, triggering a buffer overflow. There are several variations on the attack, but they all rely on three critical elements:

  • Oversized DNS replies are used to force the function to allocate a new buffer. When an incoming reply is too large to fit into the stack-allocated 2048-byte buffer, the function will allocate a new 64k buffer on the heap. When this happens, the anssiz parameter is updated to reflect the new buffer's size, while the ans parameter is left pointing to the old 2048-byte stack-allocated buffer. This is the bug that makes the attack possible.
  • A retry condition is triggered that causes the function to return back to the caller – only to be invoked again – but this time with the inconsistent buffer parameters. There are several ways of achieving this.
  • With the buffer parameter state now primed for exploitation, two additional DNS replies are sent – the second of which will overflow the 2048-byte buffer and smash the stack.
Lost in Space

Let's trace through what happens to send_dg()'s parameter state when a response arrives that is larger than the 2048-byte initial buffer size. First, the function sets up some local variables:

  int *thisanssizp;
  u_char **thisansp;
  int *thisresplenp;
   
  if ((recvresp1 | recvresp2) == 0 || buf2 == NULL) {
  thisanssizp = anssizp;
  thisansp = anscp ?: ansp;
view raw res_send.c hosted with ❤ by GitHub

Since this is the first incoming reply, the recvresp1 and recvresp2 flags are both zero, so the if-condition is true. The integer pointer thisanssizp will be set to point to the anssiz parameter on __libc_res_nsend()'s stack frame, which contains a value of 2048 as the buffer size. However, the thisansp pointer will be set differently depending on whether send_dg() is called with a non-NULL anscp parameter, which is the case in our scenario. So instead of being set to point to the ans parameter on __libc_res_nsend()'s stack frame as expected, it will be set to point to the host_buffer variable on gethostbyname4_r()'s stack frame. For now, host_buffer and ans are still pointing to the same location in memory, so everything is fine.

Next, the function tests whether the current buffer size is less than MAXPACKET bytes (this test appears to be an imperfect attempt to check whether the default buffer passed in by the caller is still being used) and whether the incoming message will fit into the current buffer. Since the incoming message is larger than 2048 bytes, the if-condition is true, and the function will allocate a new MAXPACKET-sized buffer on the heap (MAXPACKET is defined as 64k bytes):

  if (*thisanssizp < MAXPACKET
  /* Yes, we test ANSCP here. If we have two buffers
  both will be allocatable. */
  && anscp
  #ifdef FIONREAD
  && (ioctl (pfd[0].fd, FIONREAD, thisresplenp) < 0
  || *thisanssizp < *thisresplenp)
  #endif
  ) {
  u_char *newp = malloc (MAXPACKET);
  if (newp != NULL) {
  *anssizp = MAXPACKET;
  *thisansp = ans = newp;
  if (thisansp == ansp2)
  *ansp2_malloced = 1;
  }
  }
view raw res_send.c hosted with ❤ by GitHub

Notice what is happening here. The anssiz buffer size on __libc_res_nsend()'s stack frame is being updated to MAXPACKET as expected, and the host_buffer variable on gethostbyname4_r()'s stack frame is being updated to point to the new buffer. But the ans pointer that is also being updated is a local one that will disappear after the send_dg() function returns. The ans parameter on __libc_res_nsend()'s stack frame is left pointing to the old 2048-byte buffer, and its value will be preserved after the send_dg() function returns. The resulting parameter state is illustrated in the diagram below. The two parameters on __libc_res_nsend()'s stack frame are left in an inconsistent state, and the first critical element of the attack has been achieved. Note that since the send_vc() function uses the same buffer allocation logic and the same buffer parameters, it is vulnerable in exactly the same manner and this stage of the attack would proceed similarly when initiated over TCP.

Diagram3

Would blocking DNS responses greater than 2048 bytes at the firewall help? No, because there is another way to force the buffer allocation using two smaller messages. After the first reply has been received and successfully stored in the 2048-byte initial buffer, the function will attempt to store the second reply in that buffer as well –- but if there is not enough space, it will allocate a new MAXPACKET-sized buffer for it. As long as the sizes of the two replies taken together add up to more than 2048 bytes, a buffer allocation will be triggered, and the parameters on __libc_res_send()'s stack frame will be left in an inconsistent state. The only effective way to prevent the buffer allocation from happening is to limit the size of all responses to 1024 bytes or less. However, the ability to deliver DNS responses greater than 2048 bytes to the victim will still be required later on.

If At First You Don't Succeed

With the first critical element of the attack now successfully achieved, the second objective is to trigger a retry condition such that:

  • the current invocation of send_dg() or send_vc() returns while the buffer parameters are in an inconsistent state, and
  • the return values from the current invocation are such that the __libc_res_send() function invokes send_dg() or send_vc() again with the same parameters

Here, the options available to the attacker vary between send_dg() and send_vc(). In send_dg(), the easiest way to trigger the retry is to send a reply with the TC flag set. When send_dg() receives such a reply, it will set the v_circuit flag to 1 and return immediately:

  if (!(statp->options & RES_IGNTC) && anhp->tc) {
  /*
  * To get the rest of answer,
  * use TCP with same server.
  */
  Dprint(statp->options & RES_DEBUG,
  (stdout, ";; truncated answer"));
  *v_circuit = 1;
  __res_iclose(statp, false);
  // XXX if we have received one reply we could
  // XXX use it and not repeat it over TCP...
  return (1);
  }
view raw res_send.c hosted with ❤ by GitHub

With the v_circuit flag set, the __libc_res_send() function will retry the same request over TCP by invoking send_vc() with the same parameters, thus successfully completing this stage of the attack. Note that the TC flag is tested only after the incoming reply successfully passes other validity checks: it must have a valid DNS header with the ID and question matching one of the original queries. However, the answer, authority and additional sections are not checked for validity and can be omitted or filled with garbage data.

Sending a valid first reply that is exactly 2048 bytes in size to send_dg() is another way to trigger a retry. This reply will exactly fill the stack-allocated buffer leaving 0 bytes of room left, and when the second reply arrives next, send_dg() will allocate a new buffer for it -- but won't update the buffer sizes correctly. This will result in a recvfrom() call on the UDP socket with a buffer size of 0, which will error out and trigger a retry:

  *thisresplenp = recvfrom(pfd[0].fd, (char*)*thisansp,
  *thisanssizp, 0,
  (struct sockaddr *)&from, &fromlen);
  if (__glibc_unlikely (*thisresplenp <= 0)) {
  if (errno == EINTR || errno == EAGAIN) {
  need_recompute = 1;
  goto wait;
  }
  Perror(statp, stderr, "recvfrom", errno);
  goto err_out;
  }
view raw res_send.c hosted with ❤ by GitHub

Yet another way to trigger the retry in send_dg() is to let it time out without having received a single valid response. If send_dg() does not receive any network data within the polling timeout, it will check whether any valid replies were received on previous polls. If there were none, it will return with a value of 0:

  if (n == 0) {
  Dprint(statp->options & RES_DEBUG, (stdout, ";; timeout"));
  if (resplen > 1 && (recvresp1 || (buf2 != NULL && recvresp2)))
  {
  /* There are quite a few broken name servers out
  there which don't handle two outstanding
  requests from the same source. There are also
  broken firewall settings. If we time out after
  having received one answer switch to the mode
  where we send the second request only once we
  have received the first answer. */
  if (!single_request)
  {
  statp->options |= RES_SNGLKUP;
  single_request = true;
  *gotsomewhere = save_gotsomewhere;
  goto retry;
  }
  else if (!single_request_reopen)
  {
  statp->options |= RES_SNGLKUPREOP;
  single_request_reopen = true;
  *gotsomewhere = save_gotsomewhere;
  __res_iclose (statp, false);
  goto retry_reopen;
  }
   
  *resplen2 = 1;
  return resplen;
  }
   
  *gotsomewhere = 1;
  return (0);
  }
view raw res_send.c hosted with ❤ by GitHub

The __libc_res_send() function will then loop around and invoke send_dg() with the same parameters against the next name server listed in the glibc resolver's configuration (/etc/resolv.conf) – if there is only one name server in the list, it will retry the same name server. You would be wrong to think that this method of triggering the retry is incompatible with the previous requirement of sending oversized replies to trigger buffer allocation. A clever way to get around this apparent contradiction is to send a reply that is larger than 2048 bytes but has a non-matching ID or question in the DNS header. It will trigger the buffer allocation due to its size, but will not be counted as a valid response by send_dg(). The attacker can then let the function time out on the next poll and still satisfy the critical conditions for a successful exploitation entirely over UDP.

In send_vc(), triggering a retry (e.g. by sending a reply of length 0) is feasible only if there are multiple name servers listed in /etc/resolv.conf since __libc_res_send() will only call this function once per server and there is no way to force a switch to UDP. If the attacker is able to intercept the victim's DNS requests, a successful TCP-only exploit may still be within the realm of possibility -- but only if there are multiple name servers listed in the glibc resolver's configuration.

  if (n == 0) {
  Dprint(statp->options & RES_DEBUG, (stdout, ";; timeout"));
  if (resplen > 1 && (recvresp1 || (buf2 != NULL && recvresp2)))
  {
  /* There are quite a few broken name servers out
  there which don't handle two outstanding
  requests from the same source. There are also
  broken firewall settings. If we time out after
  having received one answer switch to the mode
  where we send the second request only once we
  have received the first answer. */
  if (!single_request)
  {
  statp->options |= RES_SNGLKUP;
  single_request = true;
  *gotsomewhere = save_gotsomewhere;
  goto retry;
  }
  else if (!single_request_reopen)
  {
  statp->options |= RES_SNGLKUPREOP;
  single_request_reopen = true;
  *gotsomewhere = save_gotsomewhere;
  __res_iclose (statp, false);
  goto retry_reopen;
  }
   
  *resplen2 = 1;
  return resplen;
  }
   
  *gotsomewhere = 1;
  return (0);
  }
view raw res_send.c hosted with ❤ by GitHub
The Payoff

The final step of the attack involves sending two replies, the second of which overflows the 2048-byte buffer and smashes the stack. When the send_dg() or send_vc() function is invoked during the retry, the initial state of the stack parameters is primed for exploitation as shown in the “after buffer allocation” diagram. Crucially, the function will save the inconsistent buffer parameters into local variables that will be referenced later:

  u_char *ans = *ansp;
  int orig_anssizp = *anssizp;
view raw res_send.c hosted with ❤ by GitHub

Notice that the local ans pointer is now referring to the stack-allocated 2048-byte buffer, while the orig_anssizp value is set to MAXPACKET. When the first reply arrives, it will be stored into the MAXPACKET-sized buffer allocated in the previous stage of the attack. After passing the validity checks, the first response will be marked as received.

Then, the second reply arrives, and the function will detect that the heap buffer was used for the first response. It will (disastrously) attempt to revert back to the original stack-allocated buffer for the second response:

  /* The first reply did not fit into the
  user-provided buffer. Maybe the second
  answer will. */
  *anssizp2 = orig_anssizp;
  *ansp2 = *ansp;
  }
   
  thisanssizp = anssizp2;
  thisansp = ansp2;
  thisresplenp = resplen2;
  }
view raw res_send.c hosted with ❤ by GitHub

Since the original buffer size has been inconsistently set to MAXPACKET in the previous stages of the attack, and the original buffer pointer is still referring to the 2048-byte stack-based buffer, the function will now attempt to store up to MAXPACKET bytes from the second reply into the 2048-byte buffer on gethostbyname4_r()'s stack frame. These bytes are completely within the attacker's control, and a carefully crafted payload may allow the attacker to remotely execute code on the victim's system - if it can bypass modern anti-exploit mechanisms like ALSR and NX.

The stack frame layout of the gethostbyname4_r() function is illustrated below. Note that the layout will vary slightly between glibc versions.

Diagram4

It is trivial for the attacker to overwrite the function's return address, since there is no stack canary protecting it. However, gaining control of code execution is complicated somewhat by the characteristics of the surrounding code path. First, in order to overwrite the return address, the attacker must also overwrite the local variables below it, including the two buffer pointers host_buffer and ans2p. Before the function returns, both of these pointers are checked for NULL values with an assert() condition, and passed as arguments to free(). At either of these checkpoints, invalid pointer values will cause the program to abort before execution control is gained. Second, since the buffer overflow is within shared library code, the code and data will be loaded at a random address range due to ASLR, even with non-PIE application binaries. None of these challenges are insurmountable, as attested to by the success of the vulnerability's original discoverers in creating a working exploit (which to date has not been publicly released). What it does mean is that any exploit is likely to be tightly coupled to a specific version of the target application, or rely on additional vulnerabilities to leak information about the memory layout of the process. Our (non-public) internal PoC code includes a sample RCE exploit for the Python 2.7.8-1 binary in conjunction with glibc-2.19, as distributed in Ubuntu 14.10. It takes advantage of the fact that the Python binary is non-PIE, which allows its code segment to serve as a reliable source of ROP gadgets, and its GOT table as a reliable mechanism to invoke dynamically linked library code. For simplicity, it also relies on the leak of a pointer from Python's heap (which is located at a randomized address range) in order to have a valid pointer value to pass to the free() calls before execution control is gained. This may not be strictly necessary if one can find a reliable address in Python's data segment that can pass the validity checks implemented in glibc's memory allocator.

DNS Cache Traversal

From the preceding discussion, it should be clear that inserting a simple DNS forwarder between the attacker and the victim will do nothing to mitigate this attack, assuming that the attacker's replies are passed through to the victim unaltered.
However, once a caching DNS resolver enters the picture, things quickly get complicated. Some general questions that must be answered for each scenario involving a particular caching resolver:

  • Is it possible to force a > 2048-byte reply to be sent to the victim?
  • Is it possible to force a victim to retry the DNS query with inconsistent buffer parameters?
  • Which types of reply payloads survive the scrubbing logic of the caching resolver?
  • Is it possible to encode a weaponized payload in such a way that it survives the scrubbing logic?

The answers to these questions will obviously vary for each DNS caching resolver implementation. A well-behaved resolver would not send UDP replies greater than 512 bytes to the target unless EDNS0 is enabled (it is disabled by default in glibc). Instead, it would send truncated replies with the TC flag set, immediately triggering a retry over TCP. Thus, it would not be possible to trigger the heap buffer allocation bug over UDP prior to the retry. What about targets that have EDNS0 turned on? Fortunately for them, the __libc_res_nsend() function uses a value of half the stack-allocated buffer length for the advertised EDNS0 buffer size -- which works out to 1024 bytes. This is just small enough to prevent the heap buffer allocation bug from being triggered over UDP in this scenario as well. So the attacker would be left trying to carry out the attack entirely over TCP, where triggering the heap buffer allocation bug would be easy, but triggering the retry would be the main challenge. Moreover, if the target had only one nameserver entry in /etc/resolv.conf, the attacker would be completely out of luck -- it would not be possible to trigger the TCP retry at all in that case.

As a real-world example, default desktop installs of Ubuntu starting with version 12.04 are configured to use a local dnsmasq forwarder for DNS resolution. This means that /etc/resolv.conf will contain the local loopback address of the dnsmasq forwarder as the only nameserver entry -- this arrangement provides protection from TCP-based cache-traversal attacks. Assuming that the next-hop resolvers are all well-behaved (i.e. they cannot be coerced into sending UDP replies greater than 1024 bytes to the target), default Ubuntu desktop installs should be largely safe from cache-traversal attacks.

The foregoing discussion posited that the target was behind well-behaved resolvers. It is not difficult to imagine less well-behaved implementations through which it would be possible to crash a victim process. Remote execution of code on a victim's system is a more far-fetched scenario, but even that is not out of the realm of possibility. After all, some implementations may have exploitable gaps in their response scrubbing logic. It would be better to err on the side of caution and assume that cache-traversal attacks are possible with some implementations at least some of the time. The good news is that any preventive measures that are effective against man-in-the-middle attacks will also be effective against cache-traversal attacks.

eSentire Media Contacts

Mandy Bachus | eSentire | [email protected] | +1 519.651.2200 x5226 | @MandyBachus

Angela Tuzzo | MRB Public Relations | [email protected] | +1 732.758.1100 x105 | @MRB_PR

Ready to start the conversation about cybersecurity?
Talk to us today.
Let's Talk