In other words, what are the most-used techniques to sanitize input and/or output nowadays? What do people in industrial (or even just personal-use) websites use to combat the problem?
What are the common defenses against XSS? [closed]
You should refer to the excellent OWASP website for a summary of attacks (including XSS) and defenses against them. Here's the simplest explanation I could come up with, which might actually be more readable than their web page (but probably nowhere nearly as complete).
Specifying a charset. First of all, ensure that your web page specifies the UTF-8 charset in the headers or in the very beginning of the
headelement HTML encode all inputs to prevent a UTF-7 attack in Internet Explorer (and older versions of Firefox) despite other efforts to prevent XSS.
HTML escaping. Keep in mind that you need to HTML-escape all user input. This includes replacing
". If you will ever use single-quoted HTML attributes, you need to replace
'as well. Typical server-side scripting languages such as PHP provide functions to do this, and I encourage you to expand on these by creating standard functions to insert HTML elements rather than inserting them in an ad-hoc manner.
onmouseover). Obviously, this also applies to
Validating URLs and CSS values. The same goes for URLs of links and images (without validating based on approved prefixes) because of the
Not allowing user-provided HTML. Do not allow user-provided HTML if you have the option. That is an easy way to end up with an XSS problem, and so is writing a "parser" for your own markup language based on simple regex substitutions. I would only allow formatted text if the HTML output were generated in an obviously safe manner by a real parser that escapes any text from the input using the standard escaping functions and individually builds the HTML elements. If you have no choice over the matter, use a validator/sanitizer such as AntiSamy.
There are two kinds of XSS attack. One is where your site allows HTML to be injected somehow. This is not that hard to defend against: either escape all user input data, or strip all <> tags and support something like UBB-code instead. Note: URLs may still open you up to rick-rolling type attacks.
The more insiduous one is where some third-party site contains an IFRAME, SCRIPT or IMG tag or the like that hits a URL on your site, and this URL will use whatever authentication the user currently has towards your site. Thus, you should never, ever take any direct action in response to a GET request. If you get a GET request that attempts to do anything (update a profile, check out a shopping cart, etc), then you should respond with a form that in turn requires a POST to be accepted. This form should also contain a cross-site request forgery token, so that nobody can put up a form on a third party site that's set up to submit to your site using hidden fields (again, to avoid a masquerading attack).
There are only two major areas in your code which need to be addressed properly to avoid xss issues.
before using any user input value in queries, use the database helper functions like mysql_escape_string over the data and then use it in query. It will gurantee xss safety.
before displaying user input values back into form input fields, pass them through htmlspecialchars or htmlentities. This will convert all xss prone values into characters that the browser can display without being compromised.
Once you have done the above, you are more than 95% safe from xss attacks. Then you can go on and learn advanced techniques from security websites and apply additional security on your site.
What most frameworks do is that they discourage you to directly write html form code or do queries in string form, so that using the framework helper functions your code remains clean, while any serious problem can be addressed quickly by just updating one or two lines of code in the framework. You can simply write a little library of your own with common functions and reuse them in all your projects.
If you are developing in .NET one of the most effective ways to avoid XSS is to use the Microsoft AntiXSS Library. It's a very effective way to sanitize your input.
In JSTL/JSP the best way to protect against XSS is to use the c:out tag without setting the default escapeXml parameter equal to false.