XXE Attacks Explained

spectnullbyteDecember 12, 2021December 12, 2021

Out of the many attacks that threaten web applications today, XXE remains the one that is talked about the least. Although it gets far less attention than XSS or SQL injections, it does carry its own risk and should not be taken as a slight.

In this guide, I will try to explain what XXE is, why it is dangerous, and how to protect against it. But, before we can learn about this attack, we would first need to understand a few things about XML.

Introduction to XML

XML (eXtensible Markup Language) is a tag-based language that applications use for transferring data. Contrary to other tag-based languages (like HTML), XML does not have pre-defined tags. Instead, these are defined by the user.

Here is an example of an XML code:

<email>
<sender>John</sender>
<recipient>Peter</recipient>
<subject>Hi</subject>
<message>Hi Peter, How are you doing?</message>
</email>

In the above code, the email tag contains 4 child tags: sender, recipient, subject, and message. Each of these tags encloses a string of characters, referred to in XML as parsed character data (or PCDATA).

XML File Declaration

An XML file should start with an XML declaration. This should include at least the version of XML that the file uses. It can also include the encoding and standalone as options. These two are optional, but the version attribute is mandatory.

Here is how an XML file declaration should look like:

<?xml version="1.1" encoding="UTF-8" standalone"yes"?>

As you can see, there are three different attributes:

version : This can either be 1.0 or 1.1. If you do not write an XML declaration, then the version defaults to XML 1.0.
encoding : In most cases, you will be using UTF-8. However, depending on the characters used, you can specify UTF-16 for this attribute.
standalone : This attribute can have either yes or no values. It indicates whether the XML file depends on any other files to work properly or not.

Document Type Definition (DTD)

DTD (Document Type Definition) defines the structure of an XML document so that different people can agree on the same elements and attributes to use.

There are two different types of DTDs:

Internal DTD

When an XML document includes the definition of its own structure, that definition is what we refer to as an internal DTD. It is contained in the tag <!DOCTYPE> that is written at the beginning of the file, just after the XML declaration.

Here is an example of an internal DTD:

<!DOCTYPE email [
<!ELEMENT email (sender,recipient,subject,message)>
<!ELEMENT sender (#PCDATA)>
<!ELEMENT recipient (#PCDATA)>
<!ELEMENT subject (#PCDATA)>
<!ELEMENT message (#PCDATA)>
]>

With !DOCTYPE email, we define email as the root element of the XML document.

The second line specifies that the email element should contain four child elements: sender, recipient, subject, and message.

After that, we specify that each of these child elements should contain parsed character data (PCDATA).

You have probably noticed that this DTD defines the same structure that we’ve seen in the previous XML code example.

External DTD

Now for this second type, we define the XML structure in an external file. In this case, the <!DOCTYPE> tag should contain the URL to the DTD file using the SYSTEM keyword.

We need to add the following line to the beginning of our XML document, just after the XML declaration line.

<!DOCTYPE email SYSTEM "email.dtd">

And here is what the external DTD file “email.dtd” contains:

<!ELEMENT email (sender,recipient,subject,message)>
<!ELEMENT sender (#PCDATA)>
<!ELEMENT recipient (#PCDATA)>
<!ELEMENT subject (#PCDATA)>
<!ELEMENT message (#PCDATA)>

Here again, DTD defines the same structure as in the examples we’ve seen before.

XML Entity

An XML entity is a string of characters that the XML parser replaces with another value when encountered in the document. This is similar to what variables are in programming languages.

An entity is written in the form: Ampersand (&) + name of entity + semi-colon (;).

In addition to user-defined entities, there are many built-in entities. Such examples are < and >, which get replaced with the lower than (<) and greater than (>) characters respectively.

Similar to DTDs, there are two types of entities: Internal and external.

Internal Entities

An internal entity is defined in the following form :

<!ENTITY name "value">

Whenever there is a &name; in the file, the XML parser replaces it with value.

External Entities

On the other hand, instead of providing a value, an external entity refers to a URL using the SYSTEM keyword.

<!ENTITY name SYSTEM "URL">

Similar to an internal entity, whenever a parser encounters a &name; in the XML file, it replaces it with the content of the URL that the external entity declaration refers to.

Introduction to XXE

XXE (XML eXternal Entity) is a type of attack that takes advantage of external entities in XML files.

Some websites rely on XML for transferring data between the browser and the webserver. When it receives data in XML, the webserver transmits it to an XML parser which processes this data.

As we’ve seen in the previous section, XXE, or XML eXternal Entities, are not a vulnerability on their own. Like any other XML feature, they are just an inherent part of the language. Therefore, an XML parser will, by default, interpret them, and, as expected from it, it will load the external content that they call. This, of course, can be prevented with some secure configuration practices, which we will cover later in this article.

Now a malicious user can take advantage of this XML feature to define external entities that retrieve sensitive files from the server-side, leading to exploiting a vulnerable XML parser.

Let’s see how this works with a simple example.

XXE Attack Example

To demonstrate the impact of an XXE attack, we are going to use an example taken from the Mustacchio room on TryHackMe.

As shown in the image below, we have at our disposal a form input to add a comment on the website.

With the proxy interception enabled on Burp Suite, I have typed Hello and submitted the form.

On Burp Suite, I have intercepted the following request.

We can see that the web application stores the “Hello” value in a variable named xml. This hints at the possibility that it can accept XML code as an input.

After doing some enumeration, I have managed to get the correct XML structure for adding a comment, which is as follows:

<comment>
<name>Name</name>
<author>Author</name>
<com>Comment</com>
</comment>

So, with that in mind, let’s change the xml parameter value with the following code :

<?xml version="1.0"?>
<!DOCTYPE payload [ 
<!ENTITY malicious SYSTEM "file:///etc/passwd">
]>
<comment>
<name>&malicious</name>
<author>Barry</author>
<com>The comment</com>
</comment>

The payload shouldn’t be difficult to understand. We have defined malicious as an external entity, to which we associate the content of the file file:///etc/passwd as its value.

When the XML parser arrives at &malicious;, it will load at its place the content of /etc/passwd, thus revealing sensitive information from the webserver, as the following image shows.

How to Prevent XXE Attacks

Fortunately, XXE attacks aren’t always effective. Their success requires misconfigurations to be on the target website. So, by making sure that our websites don’t have these poor configuration settings, we can mitigate the risk of XXE attacks.

Here are some of the good practices that we can implement to achieve this.

Disabling DTDs and External Entities on the XML parser.
Always validate and sanitize all user-provided inputs before their processing.
Continually patch and update XML parsers.
Scan the web application using SAST and DAST tools.

Conclusion

So, we have reached the end of this article. We’ve started by learning about XML, entities, and DTDs. We’ve then learned about XXE attacks, and how they are performed. Then, we’ve seen an example of an XXE attack using the TryHackMe room. We’ve also listed some of the good practices to follow to prevent these attacks from happening.

With all that, we’ve barely scratched the surface of XXE attacks and their impact. There are many other variants of this attack, from denial of service to sensitive information disclosure. There is still a lot to learn. I invite you to keep reading about this. To help you in that, the OWASP can be a good next step where you can learn more about the topic.

Articles