So it seems like almost everywhere you turn for advice about securing programs or resolving known security problems leads you to a ‘security guy’ telling you something along the lines of ‘well, you have to validate your inputs to prevent these kinds of issues’.
Perhaps I’ve heard it too many times or perhaps I’m just jaded, but I’m throwing the BS card. Of course, I’d never leave it at just that… I think I’ve got a pretty good case for why it’s BS.
Consider my favorite red-headed stepchild, cross-site scripting (XSS). The mechanics of this problem are simple: an application accepts some input data and then offers that data in the form of output back to a user without checking the content of the data along the way (this is the case for both reflected or stored XSS problems, fundamentally).
Now, consider a small flashback I’m about to have to a Computer Networking & Communications from my undergrad days. I know serial communications are sooo last week, but anyone remember putting together simple protocols to transfer data over a line? In a simple message-based protocol, you’d pick a few byte-values to represent a few control commands like ‘end of message’ or ‘close this channel down’. This seemed like a great plan until you tested it out and noticed that some messages were getting truncated in weird ways and occasionally the whole channel went down. If you didn’t just chalk it up to bit-gnomes and listened to the professor, what you learned was that since you intermixed the CONTROL channel with the DATA channel, your data was inadvertently being interpreted as control commands when the appropriate byte-values were present in the data being transferred. Hopefully, you then learned that to make the protocol reliable, you needed to have a mechanism to escape data that contained values that would be interpreted as control codes. How’d you implement the fix? Well, you certainly didn’t try to trace the origin and content of every byte that might enter a message. What you did was augment the send_message() function with logic to zip through the pending message and escape anything that was a control code and then you’d do the normal stuff of writing it to the wire.
This is not limited to XSS. SQL injection (or really any injection attack) is about taking input data and passing it with unchecked contents to a DB command (or any API with a control language of its own, e.g. LDAP queries). Again, input validation can help in simple cases, but you’ve gotta know a priori all the ways in which data might be used by an app and choose some kind of mutually safe set of characters to let through. Either that or encode the potentially unsafe characters so they don’t cause trouble somewhere down the line. Not very extensible, usable, or maintainable in many circumstances since it’s overly restricting and fragile to change.
Take for instance, a ‘Comments’ field in a web app. Many real-world applications really do need to allow users to use characters like ‘$’, ‘%’, ‘<’, or ‘>’ to represent things like money, percentages, and value comparisons, so I simply reject the idea of banning those characters because it’s unnecessary and indicative of misunderstanding of the real problem. Some folks (which shall remain nameless) have said, ‘well, if you need those characters, just HTML/URL encode them as part of the input validation process and you’re all set’. Well, now you’ve added another problem where you’ve gotta go decode and re-encode appropriate to any other output vectors. To supporters of this strategy I ask, how many loading dock foremen and warehouse employees do you know that would correctly interpret a printout containing ‘Boxes & packing must be <50lbs’? How about ‘Boxes%20%26%20packing%20must%20be%20%3C50lbs’?
Now, I’m not saying you should not do input validation. It adds great usability features and might limit impact of other programming mistakes. What I’m saying is that input validation alone isn’t enough. You’ve gotta have output encoding to truly solve it right. We need to ensure that architects & developers have a deeper understanding of what the problem really is in order for them to naturally build systems to these types of attacks.