Validation in Context
- 1.0 Introduction
- 1.1 What is context?
- 1.2 Why is context important?
- 2.0 The contexts of SQL Injection
- 3.0 The contexts of XSS
- 4.0 Conclusion
In this article I would like to explain the need for the need of validation (and encoding/escaping) in context. The need for validation is one that is often known only by a few implications, and usually performed either prematurely or inadequately. One of the great sources of inadequacy is encoding and validation out of context.
1.1 What is context?
Context is, in the most simple of terms, is the circumstances in which something occurs. For the sake of this article we will refer to context as the circumstances in which operations are performed on user input. An understanding of context is essential to good input validation, encoding/escaping etc (simply referred to as validation from here-on-in).
1.2 Why is context important?
Context is important because we are trying to somehow pass user input to extremely complex systems, be they relational database management systems (RDBMS), browsers or operating systems, without the complex system performing any unexpected actions.
In the simplest of terms, context is important because its (generally) pointless to use a method used in one situation on another situation, e.g. escaping quotes is pointless in stopping XSS, just as escaping < and > symbols is useless in stopping SQL Injection.
1.3 Understanding and finding contexts
For us to be able to understand and find contexts we must be intimately familiar with the system into which we are feeding user input. And furthermore we must be able to know how much capacity for error the system will allow. The easiest example of contexts which we can find would be in SQL injection, which I will cover in the next section.
2.0 The contexts of SQL Injection
To find contexts in SQL Injection we must first understand our queries. For example, if we are placing user input inside single quotes, it is pointless for us to only escape double quotes and vice versa. And so we have our first 2 states:
- An expression inside single quotes
- An expression inside double quotes
But an expression does not have to be inside quotes at all, we could have an integer which is outside quotation marks because it then does not force the RDBMS to evaluate the string and convert it to a number, and in this case our normal validation of escaping quotes (either one or both) would be of extremely limited use in stopping an attacker because the attacker does not need to use quotes to break out of the expression, and does not need quotes for almost all other actions they would like to perform. And so now we have our third context:
- An un-enclosed expression
But since we know that the value we are expecting is an integer; validating it is not all that difficult, but cases like these prevent solutions like magic_quotes_* (as in PHP's case) from working completely, and begin to illustrate why validation needs to be done in context rather than in a way which stops a few vectors.
In SQL the contexts are few, but do help to illustrate how context needs to be taken into account.
3.0 The contexts of XSS
In XSS attacks you are passing user input to a much more complex (in terms or input parsing) system than an RDBMS, you are passing user input to a browser. The most basic context people protect against is when user input is placed on the page inside a container (be it a tag or the page itself acting as a container), so we have our first context:
- User input inside a container (either a tag, or the page itself)
and the most common method (in PHP, at least) is to simply run everything through either strip_tags() (which as the name implies strips - without any additional parameters - all tags from user input), or the slightly more useful htmlentities() (which encodes all characters which have HTML character entity equivalents are translated into these entities, e.g. < is converted to < and " is converted to ", etc - encoding of quotes though,is optional -, htmlspecialchars() does much the same thing for our purposes, but it only does a few characters, those characters being <>&'" ).
Again, there are more contexts, but the same functions are often relied upon to neutralize threats in several circumstances. The next circumstance would be when user input is placed inside an attribute for a tag, e.g. :
print "<body bgcolor=\"".our_encoding_function($_GET['bg'])."\">";
In this case, using htmlentities() (or htmlspecialchars() ) as our_encoding_function() will still work, but strip_tags() will not, because what we need to filter out or encode here are double quotes, because it would be trivial for an attacker to pass this:
and be able to execute code without having any tags stripped because the attacker had no need to use tags. But not all attributes are enclosed in double quotes, lets say the developer favoured single quotes and did this:
print "<body bgcolor='".our_encoding_function($_GET['bg'])."'>";
then all our encoding (when supplied with no other parameter then the user input) becomes useless because by default htmlentities() and htmlspecialchars() do not encode single quotes. But even if it was set to encode single quotes it would not help us in the smallest if the developer had not encapsulated user input at all, like so:
print "<body bgcolor=".our_encoding_function($_GET['bg']).">";
- User input inside an attribute to a tag, enclosed with double quotes
- User input inside an attribute to a tag, enclosed with single quotes
- User input inside an attribute to a tag, un-enclosed (could also be described as being enclosed with whitespace)
print "<body onLoad="alert('.our_encoding_function($_GET['bg'])."');\">";
And so we have so many more possible contexts to keep in mind, luckily though it can be simplified down to the following rule:
- Encode single and/or double quotes depending on whether or not they are used as encapsulation.
We could of course include contexts like being inside CSS, etc, etc, but these are the main ones, and this article is designed to raise awareness of contexts rather than to be a definitive list of them.
print "<script>document.cookie = \"name=".htmlentities($_GET['name'], ENT_NOQUOTES)."\";</script>";
now, beside the fact that there is no really sane reason (that I can think of) you would be setting server-side cookies using this method, this is only an esoteric example to show you that you should not confine yourself to the common. The reason is that while we cannot escape any of the boundaries (either the string encapsulation or script tag) we can still cause unexpected behavior by setting other cookie. Lets say we set the name variable to:
<script>document.cookie = "name=Alex;\nid=0;"; </script>
which causes us to set a new cookie (we could overwrite an existing cookie as well) besides the name cookie, and we could use such injection to perform things like $_REQUEST Variable Fixation (http://kuza55.blogspot.com/2006/03/request-variable-fixation.html), or other attacks. But more than to show an actual attack, it is an example of how the context (in this case; setting a cookie) will allow you to perform attacks, where the validation needed to stop those attacks would be useless and restrictive in other cases. And it satisfies my need for an obscure attack vector, :p.
What I've outlined here is not really new information, it is merely something which is greatly neglected by many people writing web apps. If there was anything I would like you to take away from this article above all else, it would have to be; know your encapsulation and how it can be broken out of, and how unexpected behaviours can be created. There's really nothing more to it other than remembering that, and not forgetting to apply that knowledge, and not just hope that nobody will notice.