A Practical Lesson in XSS

XSS, or Cross Site Scripting vulnerabilities are some of the most common problems found in web applicates these days. An XSS bug is present when an attacker is able to inject HTML code into the website through some kind of input field. Once a bug like this is found it is really then up to the attacker to be creative and come up with some way to then write an exploit that takes advantage of the flaw. A common example might be to write some Javascript that steals a user's cookies, or to deface a webpage. It can be used to redirect users to a malicious link which then downloads some malware, thus compromising the user's computer. The bottom line here is that XSS is very dangerous, and the sad truth is that it is very easily preventable.

There are two main types of XSS vulnerabilities out there: persistent XSS, and non-persistent XSS. Persistent XSS is the case where the HTML code injected will continue to stay on the page even after a refresh or a navigation away from the web page. This is the most serious, and will be what this post focuses on in this post.

This post isn't necessarily going to be a tutorial on how to attack an application with XSS, instead it is going to be a simulation of a real world scenario that goes into where, how, and why this kind of vulnerability might present itself.

Before we begin I'll give some overview. I have created a web application that is similar to Twitter. It allows users to login, register, and post short 120 character messages. There is a global feed that is present at on the home page showing all the posts from everyone on the website, as well as a user feed that shows posts for a specific user (example /feed/frankie). There is also a feature that allows users to repost posts made by other users and to keep track of how many times a specific post has been reposted. There is however a flaw in the application, an XSS flaw, and we will get into that in a bit.

Now... let's create a fictional company, we'll call it Blue Bird, there are currently four employees working at Blue Bird, three of which are senior developers. The fourth developer, Matt, is a junior developer who has been at the company for around a month. He is learning quickly, but he still makes mistakes from time to time that have to be fixed by one of the other devs. He also isn't really familiar with Python and Flask, which is what the next project that he is working on is coded in. Matt is instructed to create the posting functionality for the new project. The posting functionality is a simple form that takes some POST data and puts it into the database. During the planning for the project, one of the senior engineers decided that Jinja2's auto escaping functionality will need to be turned off for this project. The argument being that it will make the application too slow, and Flask has functionalities that allow for escaping of information as needed. Matt knows that this now means that since he is dealing with user input he will need to find some way to sanitize the information that he will be receiving.

Later that day Matt finishes the page. This is the code that he wrote:

from flask import *  
from utils import mongo  
import uuid

bp = Blueprint(__name__, "post")

def escape(text):  
    return text.strip("<").strip(">")

@bp.before_request
def redir():  
    if "login" not in session:
        return redirect("/")

@bp.route("/post/")
def returnTemplate():  
    return render_template("post.html")

@bp.route("/post/", methods=['POST'])
def postData():  
    post_data = escape(request.form['post'][:120])
    username = session['login']
    mongo.db.posts.insert({"username":username, "text":post_data, "reposts":0, "post_id":uuid.uuid4().hex})
    return redirect("/feed/" + username)

Cool, looks good, but let's leave our fantasy world for a second. Matt remembered while he was coding that he had to sanitize the information coming in; however, since he was on a deadline and under pressure he neglected to research sanitation methods available in Flask. He reasoned that if he just striped the greater than and less than signs that it would be enough. Unfortunately, text.strip() does not do exactly what he is hoping for. Let's resume.

A few days go by and Matt has long since moved on from the posting functionality to more important parts of the site. So Blue Bird launches the application and it is a huge hit!

A hacker who goes by Greg finds out about the site and starts attempting to see if he can find any security holes. The first place he looks is the posting form, and so he attempts to inject some HTML by making a post that looks like this:

<b>Hope this works!</b>  

He gets the following result:

It doesn't work, it seems as though Matt's escape() function worked, but Greg knows better than that. There is a clue here that he notices which gives him an idea... The post kept b> in the post.

Greg makes another post with the following content:

<><b>Hope this works!</b><>  

IT WORKS!

Greg has managed to circumvent Matt's escaping function.

Now it's time to be devious... Gregs plan is to write code that automatically reposts his post without the user's permission.

While looking through the HTML he notices a function called repost and he also notices that the post_id is stored as the ID of the post class.

After some analysis it seems as though this function is in charge of sending the repost request to the backend.

Greg then makes a final post with the following content:

Hacked by Greg ;) <><script class="xss">repost($('.xss').parents().eq(0).parent().attr("id"))</script>  

Then this happens:

Now each user that views the Global Feed is going to repost Greg's post, and each user that views the reposted post is going to repost Greg's post... see the problem here? The attacker has created a self replicating XSS virus. Sure it's harmless, but just imagine if the attacker wanted to get the cookies of all users... or redirect all users that viewed the post. It's a very serious vulnerability.

So what exactly happened? Well, the attacker (Greg) injected some very basic jquery. The ultimate goal was to get the post_i and send it to the function repost. The attacker first writes the code that is required to circumvent the escaping mechanism <>. Next the actual payload code begins. He wants to inject Javascript so he creates a <script> and in order to locate where we are in the HTML he gives the script a class xss. Then we call the function repost which requires an argument post_id. He can extract post_id by locating class xss and then working backwards to the parent element, the first being text, and the next being post. When he arrives at post he can get the id attribute, and that will be the post_id needed to pass to the repost. That's it. That's all it takes to do this.

Now, you may be thinking, "Frankie, this is crazy. There is no way someone is dumb enough to do this! Besides, most frameworks these days take care of the escaping anyway. What is this? 2003?". You are correct sir! However, mistakes are still made all the time. When the pressure of a deadline is looming over us we tend to do things that we might not do otherwise, such as creating our own poorly written escape functions, or disabling the autoescaping feature of our framework.

The example that I have given in this post is not one that I have completely come up with on my own. Back in 2014 Twitter's TweetDeck suffered from an XSS attack of this same variety. You can read about it here.

The bottom line is this, protecting against these problems is not hard, but they are often overlooked in the heat of the moment. The unfortunate part is, these easily avoidable problems end up costing companies a lot of money to identify and fix. By that time the damage is already done.

Think before you code, you'll thank yourself for it later.

Also, if you want to check out the code I wrote for this post, you can check it out here. See if you can make a self reposting exploit of your own.

Good Luck.

Frankie