Table of Contents

Guide 05 in Sarah’s Welcome to Web Security Series #

In this series, Sarah discusses some common vulnerability classes found in web security and how you can find and exploit them. Today’s focus is path traversal, a simple vulnerability that can lead to serious leaks of critical files.

Introduction #

Path traversal, also known as directory traversal, dot-dot-slash, directory climbing or backtracking is a style of attack that targets files and folders outside of the web root folder. Attackers will often target sensitive files such as code, credentials or critical system files and use these to further exploit the system. Generally speaking, path traversal attacks tend to affect Unix-based systems and Windows more than other operating systems.

Finding Path Traversal Vulnerabilities #

Theoretically, a path traversal vulnerability can exist whenever a web page retrieves some information from the filesystem. This could be an image that has been loaded from the filesystem or some other user-supplied file. Some examples of HTML code and URLs that could be vulnerable include:

<img src="/loadImage?filename=photo.png">
http://example.com/index.php?file=content
http://example.com/main.cgi?home=index.html

Once we have found a potentially vulnerable endpoint, we need to test if it has any defences. The easiest way to do this is to simply replace the original filename with ../ (the origin of the dot-dot-slash name), which is a valid directory traversal sequence in both Unix-based systems and Windows systems. If the web page responds unusually, this indicates that the site is probably vulnerable to a path traversal attack.

Exploiting Path Traversal Vulnerabilities #

Path traversal vulnerabilities are fairly simple to exploit as both the underlying principle is straightforward to understand and some of the simpler defences can be bypassed fairly quickly. The most complicated part is figuring out where exactly in the filesystem you start and where you want to go. If you are targeting a Unix-based system, /etc/passwd is usually a common target while on Windows it is \windows\win.ini.

Consider a web page that loads images from the folder /var/www/images. If we wish to access /etc/passwd we need to return to the root (‘move backwards three steps’) and then ‘move forwards’ into /etc/passwd. Recalling that ../ is the equivalent of ‘stepping back one’ (or ‘moving up a level’, depending on how you like to visualise filesystems), we would want to replace the original filename with ../../../etc/passwd. When the filesystem receives this traversal sequence, rather than loading an image, it will follow that path to the folder and return its contents (assuming no additional defences).

However, it is rare that there are no defences in place. Web pages will often implement either some form of encoding, strip certain sequences or have a list of expected file types. Nevertheless, these defences can be circumvented by using URL encoding to bypass sanitation, a null byte to fake an expected file type and/or nested sequences. One or a combination of these methods may be needed to successfully access the target file/folder.

Defending Against Path Traversal #

Considering how simple it is to bypass some defences, one would hope that there are more effective defence mechanisms. The best defence is to simply not allow user-supplied input to reach the filesystem API(s) at all. The majority of applications can be rewritten to do this. If this is not possible, then user input validation or the use of verified canonicalised paths are also good ways to protect against path traversal.¹ Sanitation can also be used but it can be bypassed if an attacker is patient enough; the use of a fixed whitelist (a list of accepted values) would be preferred over just sanitation.

Conclusion #

Path traversal on its own is not a particularly complex nor dangerous attack as it can only be used to read information. However, it is this information that can be used to perform devastating attacks on the system as it is always easier to exploit a system that you can understand inside and out.

If you are interested in learning more about path traversal and web security, the WebSecurity Academy is a fantastic resource. You can create a free account and explore their labs here.

A canonical path is basically a sanitised, properly formatted path. It is essentially the shortest absolute path (though this is technically untrue). Consider the path /etc/../etc/passwd. Its canonical form would be /etc/passwd as it goes to the same place but in fewer ‘steps’. ↩︎