Sunday, October 30, 2011

The Priority of Constituencies

Lawrence Lessig wrote in Code is Law that the choices we make in writing code embody our values.  This observation is especially true when building a browser because the browser mediates interactions between many distinct entities.  Because the browser's security policy is at the heart of mediating those interactions, we should ask ourselves what values the browser's security policy embodies.

One key value is the priority of constituencies, which is enshrined in the HTML Design Principles:
In case of conflict, consider users over authors over implementors over specifiers over theoretical purity.
To better understand this principle, let's consider a specific example: whether the browser's password manager should be enabled for a given web site.

The password manager is a source of conflict for these competing interests.  Implementors (myself included) believe that password managers improve security by reducing the costs of using a large number of more complex passwords.  Many banks, however, disagree.  They believe that password managers reduce security because passwords stored in password managers can be stolen by miscreants.

How do browser vendors resolve this conflict?  By default, we enable the password manager.  Because users have a higher priority than implementors (i.e., browser vendors), browsers let users turn the password manager off.  Because authors (i.e., site operators) also have a higher priority than browser vendors, browsers let authors disable the password manager on their own web sites by setting autocomplete=off.

The careful reader will have noticed that the scheme above violates the priority of constituencies in one case.  What if the user wants to use the password manager on a web site sets autocomplete=off?  Because users have a higher priority than authors, the browser should resolve this conflict in favor of the user.  Typically, browsers handle this case via their extension system.  For example, the autocomplete=on extension lets users override authors who want to disable the password manager.

How, then, should we respond to web site operators who wish to block or override these sorts of extensions?  As long as we believe that these extensions faithfully enact the user's will, we're hard-pressed to let authors block these extensions because that would violate the priority of constituencies.  Instead, we ask authors to be humble and accept the user as sovereign.

Saturday, October 22, 2011

X-Script-Origin, we hardly knew ye

On Thursday, Robert Kieffer filed an interesting bug in both the WebKit and Mozilla bug trackers:
WebKit and Mozilla browsers redact the information passed to window.onerror for exceptions that occur in scripts that originate from external domains. Unfortunately this means that for large institutions (like us here at Facebook) that use CDNs to host static script resources, we are unable to collect useful information about errors that occur in production.
Why do browsers redact this information in the first place?  The answer is actually a combination of two factors:
  1. Although browsers generally prevent one origin from reading information from another origin, the script element, like the image element, is a bit of a loophole: an origin is allowed to execute a script from any other origin.  (This exception has wide-ranging implications on both security and commerce on the web.)
  2. The script element ignores the MIME type of resources it loads.  That means if a web page tries to load an HTML document or an image with the script element, the browser will happily request the resource and attempt to execute it as a script.
At first blush, these two facts would seem to imply a serious security vulnerability.  Certainly executing a script leaks a great deal of information about the script and ignoring the MIME type means a malicious web site can cause the browser to execute any resource, regardless of the sensitivity of the resource (e.g., an attacker can execute the HTML that represents your email inbox as if it were JavaScript).

Fortunately, we're able to snatch security from the jaws of vulnerability because of a happy coincidence: resources that contain sensitive information happen to fail to parse as valid JavaScript (at least usually).  For example, your email inbox probably consists of HTML that quickly throws a SyntaxError exception when executed as JavaScript.  (The consequences of expanding JavaScript to include HTML-like syntax is an exercise for the reader.)

Returning to our original question, we now understand that (in an attack scenario) sensitive information actually flows though the JavaScript virtual machine, where it generates an exception.  That exception is then processed by window.onerror!  If browsers did not redact the information they give to window.onerror, they would potentially leak sensitive information to malicious web sites.

How, then, can we address Robert's use case?  Certainly we would like web sites like Facebook to be able to diagnose errors in their scripts.  Robert suggests an "X-Script-Origin" HTTP header attached to the script that would indicate which origins are authorized to see exceptions generated by the script.  Although that would work, that solution seems overly specific to the problem at hand.

A more general solution is for the server hosting the script to inform the browser which origins are authorized to learn sensitive information contained in the script.  (Typically servers would authorize every origin because scripts are usually the same for every user).  We already have a general mechanism for servers to make such assertions: Cross-Origin Resource Sharing.  We can address Robert's use case by adding a crossorigin attribute to the script element that functions similarly to the crossorigin attribute on the image element.  Once the embedding origin is authorized to read the contents of the script, there's no longer any need to redact the exceptions delivered to window.onerror.

Saturday, October 15, 2011

Local URIs are more equal than others (Part 1)

On Wednesday, Cedric Sodhi asked the WebKit development mailing list why WebKit restricts access to local URIs.  This post describes one of the reasons why local URIs are more equal than other URIs.  In a future post, we'll revisit this issue when we discuss how local URIs (e.g., file:///Users/abarth/tax2010.pdf) don't really fit cleanly into the web security model.

Although the web platform largely isolates different origins from each other, there are a number of "leaks" whereby one origin can extract information from another origin.  For example, browsers let one origin embed images from another origin, leaking information such as the height and width of the images across origins.  These leaks are often at the core of security vulnerabilities in the platform.

These same leak exists, of course, between local origins (e.g., those with file URIs) and non-local origins (e.g., those with http or https URIs).  What kind of information could a web site extract from your local system using this leak?

On my laptop, I have Skype installed, which means that, on my laptop, the URI below resolves to a PNG image with a particular height and width:
If I visit a web site, if the browser doesn't address this leak, the web site could determine whether I have Skype installed by attempting to load that URI as an image.  On my laptop, the image element would have a certain well-known height and width, but on a laptop without Skype installed, the browser would fire the error event.

Returning to Cedric's question, why do browser vendors restrict access to local URIs but not to non-local URIs if both have the same information leak?  I would prefer to close this leak in both cases, but many web sites embed cross-origin images, e.g. from content delivery networks.  If we were adding the <img> tag today, we would probably require servers opt in to cross-origin embedding using the Cross-Origin Resource Sharing protocol.

Fortunately, very few web sites include images (or other resources) from local URIs (especially after we removed the full path from <input type="file">, but that's a story for another time).  That means browsers can block all loads of local resources by non-local origins without making users sad, preventing web sites from snooping on your local file system.

Sunday, October 9, 2011

Integrity for sessionStorage

There are many different ways to think about security.  I prefer the following approach:
  1. Define a set of threat models that describe the attacker's capabilities.  For example, the "man-in-the-middle" is a classic threat model in network security that represents an attacker who has complete control over the network but who has no control over network endpoints.
  2. Identify a set of security properties that we wish our system to achieve.  Defining good security properties is a tricky business, and we're mostly going to wave our hands in this blog.  If you'd like an example, you should imagine something like "the attacker doesn't learn the contents of the user's email."
  3. Determine whether an attacker with the capabilities described in the threat model could possibly defeat any of the security properties of our system.  We usually assume that the attacker knows exactly how our system works (e.g., because attackers can read W3C specifications).
This approach tends to be somewhat conservative in the sense that we underestimate whether our system is secure.  That's helpful when thinking defensively because being conservative pushes us to design systems that are secure robustly rather than systems that are secure by some happy accident.

So far, this post has been very abstract, but let's get concrete.  Recently, I've been corresponding with a number of Firefox developers about Firefox Bug 495337.  There are a number of technical details, but the issue boils down to the three factors above:
  1. Threat model.  We're concerned with an active network attacker.  (I need to write a "foundations" post introducing the important threat models in web security, but I didn't want to write too many foundations posts in a row.)  Essentially, an active network attacker has full control over the network (e.g., they can intercept and spoof HTTP requests and responses), but have very little power over secure network connections (e.g., they can't mess with TLS connections).
  2. Security property.  Here's where things get interesting.  What are appropriate security properties for sessionStorage (an API for semi-persistently storing data in the browser)?  I claim that the data an origin stores in sessionStorage should have confidentiality and integrity (i.e., other origins should not be able to learn or to alter data stored in sessionStorage).
  3. Could possibly defeat.  That leaves us with the question of whether an active network attacker could possibly defeat the confidentiality or integrity of data in sessionStorage.  I claim that such a thing is possible in Firefox (via a somewhat elaborate sequence of steps) because Firefox's behavior deviates slightly from the specification.  Specifically, in some circumstances that an attacker can provoke, Firefox considers only the host portion of the origin, ignoring the scheme and the port.  By ignoring the scheme, Firefox lets a network attacker leverage his or her ability to control HTTP to disrupt the integrity of HTTPS data in sessionStorage.
Does this represent a "real" security problem?  Well, that's a hard question to answer.  Certainly this issue makes it harder to understand the security of systems that use sessionStorage.  Instead of being able to use clean abstractions like confidentially, integrity, and origin, we need to understand more details of how exactly an attacker can subtly manipulate sessionStorage.

Ultimately, complexity is the enemy of security.  Applied judiciously, threat models and security properties can help you understand the security of your system in simpler terms.

Saturday, October 1, 2011

Foundations: Origin

Every discussion of the security architecture of the web platform should begin with the notion of an origin.  An origin is the basic unit of isolation in the web platform.  Every object in the browser is associated with an origin, which defines its security context.  When a script running in one origin tries to access an object, the browser checks whether the script's origin has access to the object's origin.

So what is an origin?  Simply put, an origin is the scheme, host, and port of the URL associated with the object.  (Hence the name of this blog.)  For example, if you're viewing an article on New York Times in your browser, that article (and all of its associated objects) are in the origin.  This blog exists in the origin, which means there is a security boundary between this blog and the New York Times.  Of course, there are many subtleties to that security boundary, which we'll get to in due course.

Many folks have written about the browser's origin-based security model, which is often referred to as the same-origin policy because, in the usual case, the browser allows one object to access another if the two objects are in "the same" origin.

If you'd like to learn more about the same-origin policy, one popular reference is Jesse Ruderman's wiki page, but, despite origin's central role in web security, there isn't a specification explaining how the same-origin policy works!  To fix that, I've been working with the IETF's websec working group to write a specification of the web origin concept.  There are still a handful of issues to address, but hopefully finish working through the IETF process soon.

Welcome, dear reader

I've decided to start blogging again.  This blog is about the security architecture of the web platform, where we are today, how we go here, and where we're going tomorrow.  My goal is to write one in-depth, technical post a week.

I'm going to focus more on defense than offense, which means I won't be posting about the newest clever attack techniques (at least not that often).  Instead, I'll be taking you behind the scenes and showing you how we make the tough calls in securing the web platform.

Please feel encouraged to give me feedback, both about what works and what doesn't.  I hope you enjoy reading!