Secure Code Review: Finding XML vulnerabilities in Code [1/2]

Home → Cyber Security → Security Code Review → Secure Code Review: Finding XML vulnerabilities in Code [1/2]

It’s been a while…

It’s been a while since my last post due to various reasons. In the past few months my passion has shifted towards code security – security code review in general. Being able to read and find vulnerabilities in code is something a hacker(ethical)/folks in cybersecurity as well as developers should have in their toolkit.

People beginning their careers in cyber security focus only on bug bounties and ways they can earn quick money. There’s a stigma(among a lot of folks in the community) that security code reviews are difficult and require a lot of coding experience. While it is true that you need to know the basics and have some experience in programming, security code reviews are definitely not difficult and can be polished over time with practice.

This series aims to get a hacker with little to no knowledge about security code reviews to being comfortable finding well-known vulnerabilities in code. Having this skill is useful in the cyber security job industry as well. Make sure you have subscribed to the mailing list to not miss any blog posts!

If you’ve read my previous posts, you would know I like to document and share my learnings with the cybersecurity community. We will be reviewing C++ code during this series and looking at different vulnerabilities that arise while coding in it. We will also look at the mitigation for these vulnerabilities.

This brings me to the first part(1/2) of this post which marks the beginning of a “Security Code Review Series” which will contain posts based on my learning. Just a heads up – as said above, these blog posts would contain information related to my learnings and practical knowledge. Hence, if there are any mistakes, I welcome people from the community to point them out and I would be happy to correct them 🙂

Finally, before starting this series I want to provide a disclaimer to everyone reading this article to ONLY use techniques taught in this series on code that either belongs to you or if you have permission to perform security code review on it.

There’s also a special announcement that I am thrilled to share with all you hackers out there!

Exciting Announcement: New YouTube Channel!

Before starting this post, I am excited to announce that I am Launching my brand-new YouTube channel!!

Subscribe to my YouTube – [https://www.youtube.com/@muqsitbaig]

I will be covering everything related to Cyber Security — Security Code Review tutorials for beginners, bug bounty specials, learning different concepts in Cyber Security, securing your assets(websites, phone, etc) and yourself, and much more!

So, before we begin this series, please subscribe to my channel. It does not matter if you are a beginner or you’ve been in the hacking scene for some time; I will be making videos along with writing these blog posts, about cybersecurity concepts in minute detail.

subscribe to learn more about cyber security!!

Introduction to XML

In case you’re not already aware, XML (Extensible Markup Language) is a markup language similar to HTML, but without predefined tags to use. Instead, you define your own tags designed specifically for your needs. This is a powerful way to store data in a format that can be stored, searched, and shared.
– Mozilla

In short, XML was only designed to store and transport data. But this won’t stop us from learning more about this markup language and finding different ways to exploit it. Since XML is quite popularly used over the internet, we need to understand the different vulnerabilities that could arise in code – if it is not properly configured/coded.

This code review series aims to do exactly that – finding vulnerabilities in code and improving your security code review skills as a beginner. Excited? Let’s jump straight into it!

finally a Security Code Review Series for beginners.

Common XML Vulnerabilities in the wild

Now that we’ve understood what XML is, we jump to the part for which everyone clicked on this article for:

finding different XML vulnerabilities in code during a security code review!

If you see that the source code which you are reviewing, is using an XML parser, it is important to review the implementation of this XML parser to make sure it is safe from XML vulnerabilities.

What is an XML parser you ask?

An XML parser is a software library/package that provides an interface for client applications to work with an XML document. The XML Parser is designed to read the XML and create a way for programs to use XML in their code.

XML parsers validate the document and check that the document is well formatted.

There are different types of XML parsers available for different programming languages for developers to choose from. Below are a few XML parsers in C++:

TinyXML
PugiXML
libxml++
xerces-C++

We will be working with xerces-c++ library(XercesDOMParser) in this blog post.

The following are a few well-known vulnerabilities found in implementations of XML that are not configured properly:

XXE (XML External Entities) attack
Billion Laughs Attack
Quadratic Blowup Attack

We will discuss what these vulnerabilities are (in short) and how to find them during a code review assessment as well as mitigations for these vulnerabilities.

Uncovering these vulnerabilities

We’ve covered the basics of XML, what XML parsers are, and their types. We also saw the different vulnerabilities that could exist in code where an XML parser is used. Now it’s time to know more about these vulnerabilities and how we can find them during a security code review assessment.

XXE (XML External Entities) Attack

An XML External Entity attack is a type of attack against an application that parses XML input. This attack occurs when XML input containing a reference to an external entity is processed by a weakly configured XML parser.

External entities are particularly interesting from a security perspective because they allow an entity to be defined based on the contents of a file path or URL.

This attack may lead to the disclosure of confidential data, denial of service, server-side request forgery, port scanning from the perspective of the machine where the parser is located, and other system impacts.

Example of an XXE attack to disclose the ‘/etc/passwd’ file(any file can be disclosed):

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
  <!ELEMENT foo ANY >
  <!ENTITY xxe SYSTEM "file:///etc/passwd" >]>
<foo>&xxe;</foo>

If you are not aware of what an XXE vulnerability is, I highly recommend you get yourself familiar with the XXE vulnerability. This post focuses more on finding these issues in code. PortSwigger provides a good idea about this vulnerability and what XML Entities are.

Assuming we know more about XXE vulnerabilities, we will now learn to find this vulnerability in our C++ code.

As mentioned above, we will be using Apache xerces-c++‘s XercesDOMParser in this post for demonstration. We will look at code snippets talking about “Insecure” configuration of the XML parser (which will give rise to XXE vulnerability) as well as a “Secure” configuration to avoid these vulnerabilities.

Insecure Code:

#include <xercesc/parsers/XercesDOMParser.hpp>
#include <xercesc/util/XMLUni.hpp>

XercesDOMParser* createInsecureParser() {
    XercesDOMParser* parser = new XercesDOMParser();
    parser->setValidationScheme(XercesDOMParser::Val_Auto);
    return parser;
}

int main() {
    XercesDOMParser* insecureParser = createInsecureParser();
    insecureParser->parse("insecure.xml");
    // Process the XML document
    delete insecureParser;
    return 0;
}

The above is an insecure implementation of the XercesDOMParser which will give rise to XXE vulnerabilities. The above program will read and understand information from an XML file.

In this code, the program is set up to understand everything in the XML file, including some special instructions called entities(as we had discussed earlier).

Unfortunately, this openness can be risky. If the XML file contains certain types of special instructions(that refer to external sources), it might lead to security issues. This is like opening the door wide without checking who is knocking. This will lead to a hacker crafting special requests with malicious external entities to fetch confidential data or perform malicious actions as discussed above.

you never know who is on the other side of the door.

Let’s look at how to avoid XXE vulnerability with just a couple of flags (in xerces-c++).

Secure Code:

#include <xercesc/parsers/XercesDOMParser.hpp>
#include <xercesc/util/XMLUni.hpp>

XercesDOMParser* createSecureParser() {
    XercesDOMParser* parser = new XercesDOMParser();
    
    // Set secure flags to prevent XXE vulnerability
    parser->setCreateEntityReferenceNodes(true);
    parser->setDisableDefaultEntityResolution(true);
    
    parser->setValidationScheme(XercesDOMParser::Val_Auto);
    return parser;
}

int main() {
    XercesDOMParser* secureParser = createSecureParser();
    secureParser->parse("secure.xml");
    // Process the XML document
    delete secureParser;
    return 0;
}

Now, picture the same program as above, but this time, we’ve added some safety measures. We told the program to be more careful when reading the XML file by adding the below two flags:

    // Set secure flags to prevent XXE vulnerability
    parser->setCreateEntityReferenceNodes(true);
    parser->setDisableDefaultEntityResolution(true);

setCreateEntityReferenceNodes(true) – This method allows the user to specify whether the parser should create entity reference nodes in the DOM tree being produced. When the flag is true, the parser will create EntityReference nodes in the DOM tree. The EntityReference nodes and their child nodes will be read-only. When the flag is false, no EntityReference nodes will be created. This flag needs to be set to “true” to avoid XXE.

setDisableDefaultEntityResolution(true) – This method gives users the option to not perform default entity resolution. If the user’s resolveEntity method returns NULL the parser will try to resolve the entity on its own. When this option is set to true, the parser will not attempt to resolve the entity when the resolveEntity method returns NULL. Again, this flag needs to be set to “true” to avoid XXE.

Ideally, the safest way to prevent XXE is always to disable DTDs (External Entities) completely. Depending on the parser or use case, it may or may not be possible.

Disabling DTDs also makes the parser secure against denial of services (DOS) attacks such as Billion Laughs. If it is not possible to disable DTDs completely, then external entities and external document type declarations must be disabled in a way that’s specific to each parser.

With the above checks, we’re telling the program to check the identity of whoever is knocking on the door before letting them in. This way, we reduce the risk of potential security problems that could arise from malicious instructions in the XML file.

Anticipate the Follow-Up: Part 2 Coming Shortly!

I’m sure you must be wondering right now how long this post must be and you’re right. It has been a long post. But if you recap, you will see that we covered what XML is in short, a little bit about XML entities and parsers along with the different types of parsers available. We then proceeded to cover a critical vulnerability – XXE (XML External Entities).

In the next part (2/2) we will be discussing the remaining 2 vulnerabilities – how to find them in code and how we can mitigate them.

This calls for an end to this post and a sequel will be posted soon (Make sure to subscribe to the mailing list)