Mitigating XSS - Why Input Validation is Bogus

Ask any security guy/gal about how to best mitigate cross-site scripting (XSS) and what is the answer? It’s some variation on validating input. Look at my own writings about this topic and what will you find? Variations on the input validation theme. Input validation is a great solution for new applications, but it’s a horrible choice for existing applications.

Why this change of heart? Well, this is something that’s been coming for a quite a while. I’ve become more and more disillusioned with input validation. Let’s start with some basics.

The first few reasons are well written about. Black-lists, whether syntactic or semantic, suffer the problems of black-lists: they can only look for known bad data. Not only that, but they often prevent good data from being input.

White-lists are then given as the answer. But, I’ve been had a nagging suspicion that syntactic white-lists are useless except for a small number of highly structured types. What’s worried me is that when writing several language compilers I’ve always had cases where I had to write syntax rules that were more lenient than the language allowed and then sort out the problem in semantic analysis. This implementation restriction was due to the limitations of my parsers and my parsers were a lot more sophisticated then the reg-exp parsers in many validation frameworks. So, syntactic white-lists have to let “bad data” through; not a fool proof solution.

Semantic white-lists? Well, these are great for enumerations, but is all input enumerations?

Don’t even get me started about GET versus POST.

Where does this leave us? Well, it leaves us with writing guidance that looks like a patch-work quilt. You use semantic white-lists here, but not there. You use syntactic white-lists for some kinds of data, but they will let bad data through.

So, here’s the nail in the output validation coffin. Let’s say you had a sizeable application with lots of fields of different types. What are you going to do? Take a swing that you can figure how the right type of input validation? What if you get it wrong? Well, if you get input validation wrong you just broke some percentage of your existing installed base. How large a percentage - you don’t know.

Why not just go with an output encoding solution? Yes, it seems like you’re snipping off the leaves of the tree. Yes, you need to build some way to tag output so that your test team can test that you’ve snipped off all of the leaves.

When I think about this problem as managing risk, Am I worried about the N% of the XSS bugs I’ve not fixed or the N% of the input fields I’ve now broken? I dunno about you, but I’d rather not break something that worked and have to “roll back” that fix - thus [re-]opening up an XSS vulnerability.

Technorati Tags: ,

6 Responses to “Mitigating XSS - Why Input Validation is Bogus”

  1. Andy Steingruebl Says:

    I was going to respond to Pravir’s earlier piece on this, but I’ll just do it here.

    From an architectural basis we often have 1 place where input can enter our system (at least direct user input) but multiple places where we consume that data downstream.

    Input = web application
    Consumer = web application, administration interface, messagebus, database, shell script, ETL job, batch report writer, etc.

    While I agree that outpt filtering is a lot cleaner way to address the issue from a proper semantic and output-type context, I worry that I’m relying on a strategy of fixing all of my endpoints rather than at least *trying* to fix the problem, or mitigate some part of it, in a single location.

    Trust boundaries and borders in internal applications can be quite complicated, and ensuring that proper output validation happens at all of those points is certainly the right design, but skipping some form of input checking and relying on all of the downstream consumers to get it right seems risky.

    Its sort of like the old protocol statement: Be lenient in what you accept and stringent in what you emit. I think the same thing holds for applications.

  2. scott Says:

    Input validation is a very seductive choice for the reason you mention: there’s a choke point that can be clearly identified within the design. I further agree that the notion of trying to fix the problem at all the points where one generates output is like trying to snip off the leaves of a tree. But at the end of the day, you have to have good output encoding because all of your input validation is going to have holes - it’s just the nature of the beast.

    In your example, you’re assuming that the Web app is the input source. My perspective is the opposite. I see a lot of legacy systems and B2B systems providing the bulk of the input and the Web applications are just the dashboard for the data. That’s the opposite of your example - data’s coming from all over the enterprise and the poor little (in comparision to the overall system) has to cope with this data that it doesn’t have a chance to filter.

    You can certainly implement input-validation for a defense depth strategy. I’m totally on board with that. But, too often I see input validation paraded around as a silver bullet. I guess my title was a bit of a knee jerk reaction to that.

  3. Andy Steingruebl Says:

    Fair enough. I’ve seen and/or heard of both types of attacks so I agree it isn’t a either/or, its a both type of answer.

    Part of figuring out which to try and tackle first, assuming they both are non-ideal, is understanding the data flows and what you can solve with the least effort, or at least the order to fix them in.

    Architecturally one of the problems in the web space anyway is that we don’t have a clear separation between code and data, hence this problem in the first place. We wouldn’t have so much trouble if they were actually separate and/or we had a content-restriction policy. Witness the thread on the WASC mailing list today…

  4. scott Says:

    If you are designing a new system, I highly recommend designing in an input validation framework. But, many systems are already deployed and the pragmatic answer is that adding input validation is only going to create backward incompatibility and give the IT group and “the security guy” specifically a bad name.

    Err, the architectural fault for lack of separation between code and data is more deeply rooted than the web. Try von Neumann.

  5. Andy Steingruebl Says:

    Yeah, I just felt bad blaming him :)

  6. anonymous coward Says:

    Nitesh Dhanjani’s 2005 blog entry Repeat After Me: Lack of _Output Encoding_ Causes XSS Vulnerabilities talks about this..

Leave a Reply



Resources
> Overview
> Your Account
> Podcast
> Blog
> Case Studies
> White Papers
> Publications
> Books
> Security Articles
> Presentations


RSS

About the Bloggers
  • Pravir Chandra
  • Scott Matsumoto
  • Gary McGraw
  • Sammy Migues
  • Craig Miller
  • John Steven
  • Categories
  • Admin (3)
  • Assurance (6)
  • Data Security (3)
  • Defects, Bugs, and Flaws (3)
  • Enterprise Software Security (11)
  • General Interest (5)
  • Governance and Regulation (5)
  • Risk Management (4)
  • Security Features (2)
  • SOA and Web 2.0 (2)
  • Software Quality (4)
  • Software Security (36)
  • Software Security Touchpoints (7)
  • Software Testing (2)
  • Training (3)
  • Archives
  • August 2008
  • July 2008
  • June 2008
  • May 2008
  • April 2008
  • March 2008
  • February 2008
  • January 2008
  • December 2007
  • November 2007
  • October 2007
  • September 2007
  • August 2007
  • July 2007
  • June 2007
  • May 2007
  • April 2007
  • March 2007
  • February 2007
  • By Blogger
  • Craig
  • Gary
  • John
  • Pravir
  • Sammy
  • Scott
  • Guest bloggers
  • Recent Comments
  • gem on The Never Ending Open Source Security Debate Drags On: Hi Andre, Thanks for your resonse. If I...
  • Andre Gironda on The Never Ending Open Source Security Debate Drags On: “The Never Ending Open...
  • Ryan on More on comics and security: Kevin — only two of the animations have audio.
  • gem on More on comics and security: Hi Don, I grew up in east TN (Kingsport) and drove to Knoxville...
  • Don Clifton on More on comics and security: Gary, I just found Cigital’s site by accident not to...
  • Recent Entries
  • The Never Ending Open Source Security Debate Drags On
  • More on comics and security
  • Answering Security Questions in Context
  • Search Security video
  • 13 reasons for UML’s descent into darkness
  • Links
  • Cigital
  • Silver Bullet Podcast
  • Blogroll
  • 1 Raindrop
  • Fortify Software's Blog
  • Freedom to Tinker
  • In the Wild
  • Jon Udell
  • Michael Howard's Blog
  • Microsoft Security Vulnerability Research and Defense
  • News.com Security Blog
  • Schneier on Security
  • Security Fix
  • SilverStr's Blog
  • Tao Security