What are the common defenses against XSS? [closed]

In other words, what are the most-used techniques to sanitize input and/or output nowadays? What do people in industrial (or even just personal-use) websites use to combat the problem?

You should refer to the excellent OWASP website for a summary of attacks (including XSS) and defenses against them. Here's the simplest explanation I could come up with, which might actually be more readable than their web page (but probably nowhere nearly as complete).

  1. Specifying a charset. First of all, ensure that your web page specifies the UTF-8 charset in the headers or in the very beginning of the head element HTML encode all inputs to prevent a UTF-7 attack in Internet Explorer (and older versions of Firefox) despite other efforts to prevent XSS.

  2. HTML escaping. Keep in mind that you need to HTML-escape all user input. This includes replacing < with &lt;, > with &gt;, & with &amp; and " with &quot;. If you will ever use single-quoted HTML attributes, you need to replace ' with &#39; as well. Typical server-side scripting languages such as PHP provide functions to do this, and I encourage you to expand on these by creating standard functions to insert HTML elements rather than inserting them in an ad-hoc manner.

  3. Other types of escaping. You still, however, need to be careful to never insert user input as an unquoted attribute or an attribute interpreted as JavaScript (e.g. onload or onmouseover). Obviously, this also applies to script elements unless the input is properly JavaScript-escaped, which is different from HTML escaping. Another special type of escaping is URL escaping for URL parameters (do it before the HTML escaping to properly include a parameter in a link).

  4. Validating URLs and CSS values. The same goes for URLs of links and images (without validating based on approved prefixes) because of the javascript: URL scheme, and also CSS stylesheet URLs and data within style attributes. (Internet Explorer allows inserting JavaScript expressions as CSS values, and Firefox is similarly problematic with its XBL support.) If you must include a CSS value from an untrusted source, you should safely and strictly validate or CSS escape it.

  5. Not allowing user-provided HTML. Do not allow user-provided HTML if you have the option. That is an easy way to end up with an XSS problem, and so is writing a "parser" for your own markup language based on simple regex substitutions. I would only allow formatted text if the HTML output were generated in an obviously safe manner by a real parser that escapes any text from the input using the standard escaping functions and individually builds the HTML elements. If you have no choice over the matter, use a validator/sanitizer such as AntiSamy.

  6. Preventing DOM-based XSS. Do not include user input in JavaScript-generated HTML code and insert it into the document. Instead, use the proper DOM methods to ensure that it is processed as text, not HTML.

Obviously, I cannot cover every single case in which an attacker can insert JavaScript code. In general, HTTP-only cookies can be used to possibly make an XSS attack a bit harder (but by no means prevent one), and giving programmers security training is essential.

There are two kinds of XSS attack. One is where your site allows HTML to be injected somehow. This is not that hard to defend against: either escape all user input data, or strip all <> tags and support something like UBB-code instead. Note: URLs may still open you up to rick-rolling type attacks.

The more insiduous one is where some third-party site contains an IFRAME, SCRIPT or IMG tag or the like that hits a URL on your site, and this URL will use whatever authentication the user currently has towards your site. Thus, you should never, ever take any direct action in response to a GET request. If you get a GET request that attempts to do anything (update a profile, check out a shopping cart, etc), then you should respond with a form that in turn requires a POST to be accepted. This form should also contain a cross-site request forgery token, so that nobody can put up a form on a third party site that's set up to submit to your site using hidden fields (again, to avoid a masquerading attack).

There are only two major areas in your code which need to be addressed properly to avoid xss issues.

  1. before using any user input value in queries, use the database helper functions like mysql_escape_string over the data and then use it in query. It will gurantee xss safety.

  2. before displaying user input values back into form input fields, pass them through htmlspecialchars or htmlentities. This will convert all xss prone values into characters that the browser can display without being compromised.

Once you have done the above, you are more than 95% safe from xss attacks. Then you can go on and learn advanced techniques from security websites and apply additional security on your site.

What most frameworks do is that they discourage you to directly write html form code or do queries in string form, so that using the framework helper functions your code remains clean, while any serious problem can be addressed quickly by just updating one or two lines of code in the framework. You can simply write a little library of your own with common functions and reuse them in all your projects.

If you are developing in .NET one of the most effective ways to avoid XSS is to use the Microsoft AntiXSS Library. It's a very effective way to sanitize your input.

In JSTL/JSP the best way to protect against XSS is to use the c:out tag without setting the default escapeXml parameter equal to false.

<c:out value="${somePossiblyDangerousVar}"/>