Archive for the ‘Uncategorized’ Category

XML 1.1 EBNF

Tuesday, March 9th, 2010

I’ve been searching for a complete EBNF for XML 1.1 without much success. I found one for XML 1.0, but I was hoping to avoid manually patching it for the XML 1.1 changes.

In the end, I decided that it would be easiest to just parse the EBNF directly out of the specification. Here it is, for reference:

[1] document ::= prolog element Misc* ) - ( CharRestrictedChar Char* )
[2] Char ::= [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */
[2a] RestrictedChar ::= [#x1-#x8] | [#xB-#xC] | [#xE-#x1F] | [#x7F-#x84] | [#x86-#x9F]
[3] S ::= (#x20 | #x9 | #xD | #xA)+
[4] NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
[4a] NameChar ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]
[5] Name ::= NameStartChar (NameChar)*
[6] Names ::= Name (#x20 Name)*
[7] Nmtoken ::= (NameChar)+
[8] Nmtokens ::= Nmtoken (#x20 Nmtoken)*
[9] EntityValue ::= '"' ([^%&"] | PEReferenceReference)* '"'
|  "'" ([^%&'] | PEReferenceReference)* "'"
[10] AttValue ::= '"' ([^<&"] | Reference)* '"'
|  "'" ([^<&'] | Reference)* "'"
[11] SystemLiteral ::= ('"' [^"]* '"') | ("'" [^']* "'")
[12] PubidLiteral ::= '"' PubidChar* '"' | "'" (PubidChar - "'")* "'"
[13] PubidChar ::= #x20 | #xD | #xA | [a-zA-Z0-9] | [-'()+,./:=?;!*#@$_%]
[14] CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*)
[15] Comment ::= '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->'
[16] PI ::= '<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>'
[17] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l'))
[18] CDSect ::= CDStart CData CDEnd
[19] CDStart ::= '<![CDATA['
[20] CData ::= (Char* - (Char* ']]>' Char*))
[21] CDEnd ::= ']]>'
[22] prolog ::= XMLDecl Misc* (doctypedecl Misc*)?
[23] XMLDecl ::= '<?xml' VersionInfo EncodingDeclSDDeclS? '?>'
[24] VersionInfo ::= S 'version' Eq ("'" VersionNum "'" | '"' VersionNum '"')
[25] Eq ::= S? '=' S?
[26] VersionNum ::= '1.1'
[27] Misc ::= CommentPIS
[28] doctypedecl ::= '<!DOCTYPE' S Name (S ExternalID)? S? ('[' intSubset ']' S?)? '>' [VC: Root Element Type]
[WFC: External Subset]
[28a] DeclSep ::= PEReferenceS [WFC: PE Between Declarations]
[28b] intSubset ::= (markupdeclDeclSep)*
[29] markupdecl ::= elementdeclAttlistDeclEntityDeclNotationDeclPIComment [VC: Proper Declaration/PE Nesting]
[WFC: PEs in Internal Subset]
[30] extSubset ::= TextDeclextSubsetDecl
[31] extSubsetDecl ::= markupdeclconditionalSectDeclSep)*
[32] SDDecl ::= S 'standalone' Eq (("'" ('yes' | 'no') "'") | ('"' ('yes' | 'no') '"')) [VC: Standalone Document Declaration]
[39] element ::= EmptyElemTag
STag content ETag [WFC: Element Type Match]
[VC: Element Valid]
[40] STag ::= '<' Name (S Attribute)* S? '>' [WFC: Unique Att Spec]
[41] Attribute ::= Name Eq AttValue [VC: Attribute Value Type]
[WFC: No External Entity References]
[WFC: No < in Attribute Values]
[42] ETag ::= '</' Name S? '>'
[43] content ::= CharData? ((elementReferenceCDSectPICommentCharData?)*
[44] EmptyElemTag ::= '<' Name (S Attribute)* S? '/>' [WFC: Unique Att Spec]
[45] elementdecl ::= '<!ELEMENT' S Name S contentspec S? '>' [VC: Unique Element Type Declaration]
[46] contentspec ::= 'EMPTY' | 'ANY' | Mixedchildren
[47] children ::= (choiceseq) ('?' | '*' | '+')?
[48] cp ::= (Namechoiceseq) ('?' | '*' | '+')?
[49] choice ::= '(' ScpS? '|' Scp )+ S? ')' [VC: Proper Group/PE Nesting]
[50] seq ::= '(' ScpS? ',' Scp )* S? ')' [VC: Proper Group/PE Nesting]
[51] Mixed ::= '(' S? '#PCDATA' (S? '|' SName)* S? ')*'
| '(' S? '#PCDATA' S? ')' [VC: Proper Group/PE Nesting]
[VC: No Duplicate Types]
[52] AttlistDecl ::= '<!ATTLIST' S Name AttDefS? '>'
[53] AttDef ::= S Name S AttType S DefaultDecl
[54] AttType ::= StringTypeTokenizedTypeEnumeratedType
[55] StringType ::= 'CDATA'
[56] TokenizedType ::= 'ID' [VC: ID]
[VC: One ID per Element Type]
[VC: ID Attribute Default]
| 'IDREF' [VC: IDREF]
| 'IDREFS' [VC: IDREF]
| 'ENTITY' [VC: Entity Name]
| 'ENTITIES' [VC: Entity Name]
| 'NMTOKEN' [VC: Name Token]
| 'NMTOKENS' [VC: Name Token]
[57] EnumeratedType ::= NotationTypeEnumeration
[58] NotationType ::= 'NOTATION' S '(' SName (S? '|' SName)* S? ')' [VC: Notation Attributes]
[VC: One Notation Per Element Type]
[VC: No Notation on Empty Element]
[VC: No Duplicate Tokens]
[59] Enumeration ::= '(' SNmtoken (S? '|' SNmtoken)* S? ')' [VC: Enumeration]
[VC: No Duplicate Tokens]
[60] DefaultDecl ::= '#REQUIRED' | '#IMPLIED'
| (('#FIXED' S)? AttValue) [VC: Required Attribute]
[VC: Attribute Default Value Syntactically Correct]
[WFC: No < in Attribute Values]
[VC: Fixed Attribute Default]
[WFC: No External Entity References]
[61] conditionalSect ::= includeSectignoreSect
[62] includeSect ::= '<![' S? 'INCLUDE' S? '[' extSubsetDecl ']]>' [VC: Proper Conditional Section/PE Nesting]
[63] ignoreSect ::= '<![' S? 'IGNORE' S? '[' ignoreSectContents* ']]>' [VC: Proper Conditional Section/PE Nesting]
[64] ignoreSectContents ::= Ignore ('<![' ignoreSectContents ']]>' Ignore)*
[65] Ignore ::= Char* - (Char* ('<![' | ']]>') Char*)
[66] CharRef ::= '&#' [0-9]+ ';'
| '&#x' [0-9a-fA-F]+ ';' [WFC: Legal Character]
[67] Reference ::= EntityRefCharRef
[68] EntityRef ::= '&' Name ';' [WFC: Entity Declared]
[VC: Entity Declared]
[WFC: Parsed Entity]
[WFC: No Recursion]
[69] PEReference ::= '%' Name ';' [VC: Entity Declared]
[WFC: No Recursion]
[WFC: In DTD]
[70] EntityDecl ::= GEDeclPEDecl
[71] GEDecl ::= '<!ENTITY' S Name S EntityDef S? '>'
[72] PEDecl ::= '<!ENTITY' S '%' S Name S PEDef S? '>'
[73] EntityDef ::= EntityValue | (ExternalID NDataDecl?)
[74] PEDef ::= EntityValueExternalID
[75] ExternalID ::= 'SYSTEM' S SystemLiteral
| 'PUBLIC' S PubidLiteral S SystemLiteral
[76] NDataDecl ::= S 'NDATA' S Name [VC: Notation Declared]
[77] TextDecl ::= '<?xml' VersionInfoEncodingDecl S? '?>'
[78] extParsedEnt ::= TextDeclcontent ) - ( CharRestrictedChar Char* )
[80] EncodingDecl ::= S 'encoding' Eq ('"' EncName '"' | "'" EncName "'" )
[81] EncName ::= [A-Za-z] ([A-Za-z0-9._] | '-')* /* Encoding name contains only Latin characters */
[82] NotationDecl ::= '<!NOTATION' S Name S (ExternalIDPublicIDS? '>' [VC: Unique Notation Name]
[83] PublicID ::= 'PUBLIC' S PubidLiteral

Google News is Testing a New Layout

Sunday, February 14th, 2010

I was surprised a few days ago with a new layout for Google News on one of the machines I use. I haven’t seen it reported anywhere yet. I came across it tonight while testing and grabbed a few screen captures in case it disappears.

The biggest change is that each story now has a bordered box around it and the stars introduced recently have been moved to the right-hand side of the story. Each story has a drop-down menu associated with it, but the only option in there is “Email”. This drop-down has the class “share-icons” – possibly a new location for a “share in reader/buzz” option.

Looking through the DOM, it appears that this is called the new “blended” story style.

Here’s a couple of screenshots:


Comparing Theora 1.1.1 with x264

Sunday, January 24th, 2010

There’s a comparison that I was pointed at today showing the difference between a number of different modern encoders. The Theora 1.1 demo on that page looked particularly bad, so I decided to investigate.

I downloaded and compiled the latest versions of libogg, libvorbis and libtheora from xiph.org. I ran the encoder example (as the other test did), but discovered that it does support two-pass encoding, contrary to what was stated on the other test’s page. I ran a few different two-pass encodes (including one with soft rates), but didn’t get a lot of variability between results. Overall, however, my newly re-encoded version is significantly better than the one on the other page.

It appears that the Theora 1.1 version is at least as good as the XviD encoding, but at a smaller bitrate. It’s not as crisp as x264, but I’m not sure I’d notice the difference.

Source (via saintdevelopment):

Theora 1.1 (2939kbps):

./encoder_example -o touhou.theora.ogg --two-pass -V 2931 stream.y4m

XviD (3127kbps,via saintdevelopment):

x264 (2951kbps, via saintdevelopment):

Just so it’s clear: all of the results on this page with the exception of the Theora 1.1.1 screenshot, are taken from Saint Development’s codec comparison. The images in the comparison are from frame 700 in the final video (extracted in PNG format using mplayer as described by the test).

3rd-Party Cookies, DOM Storage and Privacy

Wednesday, January 6th, 2010

We’re in the process of launching our new publisher integration feature at DotSpots and needed to gain an understanding of how browsers deal with “third-party cookies”, or cookies that are set on domains that differ from the top-level domain.

Each of the major browsers, Firefox, Chrome, Safari, Opera and Internet Explorer have their own quirks about how cookies are accepted, which vary wildly depending on whether they were set by a top-level page, or a page in a third-party iframe.

Executive summary: Based on this study of browsers, the ideal method of storing information in iframes is a combination of localStorage for modern browsers and persistent cookies with a privacy policy for downlevel IE and Firefox versions. The default privacy settings are permissive enough on most of the old browsers to make this approach feasible. Earlier Safari versions that don’t support localStorage are out-of-luck, but the market share is too small to worry about.

These are the browsers I tested:

  • Firefox Default (checked ‘accept third-party cookies’)
  • Firefox (unchecked ‘accept third-party cookies’)
  • Chrome Default (allow all cookies)
  • Chrome (accept cookies only from sites I visit)
  • Safari Default (only from sites I visit)
  • Safari (accept cookies always)
  • Opera Default (accept cookies)
  • Opera (only from the site I visit)
  • IE6 Default
  • IE7 Default
  • IE8 Default

All browsers that support localStorage support setting and retrieving storage values from any frame. This includes Firefox, Chrome, Safari and IE8. HTML5 localStorage is by far the most reliable way to store information at this time for browsers that support it. There is one small difference: in Firefox, blocking all cookies will also block localStorage. In Chrome and Safari, blocking cookies does not block localStorage.

Chrome and Safari are based on the same WebKit engine and, as expected, share the same cookie policies for the same modes. Chrome defaults to the more permissive ‘Allow all cookies’ setting, while Safari defaults to ‘Allow Cookies from Sites I Visit’. When third-party cookies are disabled, frames can read cookies set by top-level pages but not write them.

Firefox defaults to a permissive mode in which cookies can be set and retrieved from all locations. Unlike WebKit browsers, disallowing third-party cookies means that a third-party iframe cannot read or write cookies at all.

Opera also defaults to permissive mode. In “accept cookies only from the site I visit” mode, it behaves the same way as Firefox does when third-party cookies are disallowed.

Internet Explorer is a bit finicky about privacy. If you add a basic privacy policy header to your responses, cookies will be accepted from iframes. Without the policy, most cookies can’t be set or retrieved in iframes at all (the exception is that iframes can read session cookies set at the top level). The following P3P header is sufficient to fix cookies in iframes.

P3P: CP="CAO PSA OUR"

To configure the header in Apache, you can use a simple mod_header line:

Header append P3P "CP=\"CAO PSA OUR\""

For those interested, here is a breakdown of cookie handling by browser and mode. Unlike cookies, HTML5 localStorage has no known limitations, so it has been omitted from the following charts:

WebKit, Allow All Cookies (Chrome default):

Set / Can be read Top level Iframe
Top level X X
Iframe X X

WebKit, Only From Sites I Visit (Safari default):

Set / Can be read Top level Iframe
Top level X X
Iframe

Firefox, default:

Set / Can be read Top level Iframe
Top level X X
Iframe X X

Firefox, unchecked ‘accept third-party cookies’:

Set / Can be read Top level Iframe
Top level X
Iframe

Opera, default:

Set / Can be read Top level Iframe
Top level X X
Iframe X X

Opera, Only From Sites I Visit:

Set / Can be read Top level Iframe
Top level X
Iframe

IE6/IE7/IE8 default mode, without Privacy policy:

Set / Can be read Top level Iframe
Top level X *
Iframe

* session cookies set by the top-level page may be read by iframes, but persistent cookies may not

IE6/IE7/IE8 default mode, with Privacy policy:

Set / Can be read Top level Iframe
Top level X X
Iframe X X

Chrome for Mac has Extensions Again!

Wednesday, January 6th, 2010

It looks like the Chrome team has finally re-enabled extensions on Mac again (hat-tip to MG Siegler) in the latest dev channel build. You can’t update from the “About Google Chrome” screen yet, but you can grab the latest build and install it directly.

Even better, they are now allowing Mac users to install extensions directly from extension pages (warning: shameless self-promotion link).

Extensions are working well. They’ve turned on the browser action menus, so the various extensions that I’ve installed all work properly.

Bookmark sync is enabled as well, but I haven’t seen where the bookmarks are supposed to show up in Google Docs.

I’ve really missed having extensions for this last month; getting them back made my day!

Firefox Extensions written in GWT, revisited

Wednesday, December 30th, 2009

I briefly blogged earlier this year about our internal project that allowed us to write Firefox extensions using the Google Web Toolkit framework. I’m happy to say that I’ve just pushed out the first version of the code for developers to start playing with.

Building a Firefox extension isn’t much different than writing a standard GWT web application. There are some caveats: there isn’t a global window ($wnd) or document ($doc) and the GWT widget system doesn’t work without some tweaks. You can, however, take advantage of GWT’s extensive DOM bindings to manipulate pages that the user loads and interact with the Chrome DOM to add toolbar buttons and menu items. I’m slowly working on extracting the code to work with these browser elements from our proprietary codebase, cleaning them up and pushing them into the open-source project.

For now, the current version of gwt-firefox-extension should be sufficient to write an application with the same functionality as a greasemonkey script without dipping into more advanced concepts. We’ve also generated bindings for the whole set of XPCOM IDL, so you’ll have access to most every service and component in the browser if you need to do something more complicated.

Try it out and join our open-source mailing list if you’ve got any feedback or suggestions.

Re-enable ‘Install’ button for Mac Chrome Extensions

Tuesday, December 8th, 2009

UPDATE (Jan 6, 2010): The Chrome team has turned on extensions on Mac in the latest dev channel build. Even better, you can install directly from extension pages without this bookmarklet. Hooray!

We just officially launched our extension on the Chrome extension gallery today and Mac users are having trouble installing the extension. The ‘Install’ button for Chrome extensions is disabled if you are running on a Mac.

You can manually download extensions from the Google download site using a URL like so:

https://clients2.google.com/service/update2/crx?response=redirect&x=id%3Danfibeojfgdfejcfflalkebdfgojfbbm%26uc.

The bolded part of the URL above is the extension’s public ID (a hash of the public key).

Here’s a bookmarklet that’ll re-enable the button and allow you to download the extension. It recreates the download URL from the current page you are viewing:

Enable Extension Install

Try it out on the DotSpots extension page here. Click the bookmarklet and the Install button will be activated.

UPDATE: I removed the lang=en-US part of the URL. Some extensions fail to load with that attribute in place.

Note that you’ll need an extension-supported build. The next developer channel build should support it. If you are too anxious to wait, you can install the latest Chromium build from here: http://build.chromium.org/buildbot/snapshots/chromium-rel-mac/34059/.

Packing Chrome extensions in Python

Monday, November 9th, 2009

We’re just about to release our DotSpots extension for Chrome and I’ve been working on integrating the CRX packaging into our build process. CRX files are basically ZIP files with an RSA signature and public key tacked on to the front of it. Generating these files requires you to use the Chrome –pack-extension argument (which in turn requires you to deploy the 100MB+ Chrome binaries to your build machine).

The existing code to pack a Chrome extension in Python is pretty dated: it will only generated the insecure CRX version 1 format that doesn’t use a cryptographic signature. There’s some Ruby code to pack a version 2 extension, but it requires a lot of dependencies that aren’t installed by default on OSX or in Fedora.

I’ve written some code in Python that uses openssl under the hood to do the grunt work. It cuts some corners by requiring you to pre-zip your files, but you’ll get better results from 7zip -9 than Python’s internal zip code anyways. Pass it three arguments: The input ZIP file, the PEM key (generated when you manually pack the extension in Chrome for the first time) and the output file.

#!/usr/bin/python
# Cribbed from http://github.com/Constellation/crxmake/blob/master/lib/crxmake.rb
# and http://src.chromium.org/viewvc/chrome/trunk/src/chrome/tools/extensions/chromium_extension.py?revision=14872&content-type=text/plain&pathrev=14872

import sys
from array import *
from subprocess import *

arg0,input,key,output = sys.argv

# Sign the zip file with the private key in PEM format
signature = Popen(["openssl", "sha1", "-sign", key, input], stdout=PIPE).stdout.read();

# Convert the PEM key to DER (and extract the public form) for inclusion in the CRX header
derkey = Popen(["openssl", "rsa", "-pubout", "-inform", "PEM", "-outform", "DER", "-in", key], stdout=PIPE).stdout.read();

out=open(output, "wb");
out.write("Cr24")  # Extension file magic number
header = array("l");
header.append(2); # Version 2
header.append(len(derkey));
header.append(len(signature));
header.tofile(out);
out.write(derkey)
out.write(signature)
out.write(open(input).read())

print "Done."

Real-time Search and Frnégtttrdre

Monday, September 14th, 2009

Robert Scoble’s accidental tweet (“Frnégtttrdre”) earlier tonite caused a minor ripple: people wondering if he was announcing a secret project, under the influence of alcohol or wandering around with an unlocked iPhone in his back pocket.

It also makes for an interesting test for real-time search.

An hour after his tweet:

Anyone else I’ve missed?

Conclusions: If you tweet a random word, there’s no guarantee that you’ll get indexed right away.  Additionally, not every service tests unicode querystring parameters.

Serious rssCloud Protocol DDoS Vulnerability

Thursday, September 10th, 2009

UPDATE: There’s a new domain parameter in rssCloud that makes this DDoS far, far worse.  Since there’s no verification (yet) on rssCloud endpoints, you can now subscribe any server to any rssCloud hub’s notifications.

While researching some of the issues of rssCloud running in a shared hosting environment, I came across a serious vulnerability in the protocol. The vulnerability allows someone to cripple a shared web host. Because of the sensitive nature of this vulnerability, I’m not going to share example code or which shared host(s) are vulnerable.  The fix is easy: follow these security recommendations to close the hole.

The inspiration for this vulnerability was discovered by Nick Lothian’s post on FriendFeed. It turns out that many shared hosting providers route incoming and outgoing HTTP requests through different IP addresses. The process of routing the HTTP requests is usually done transparently by a networking gear outside of the web servers themselves.

rssCloud’s specification infers the endpoint from the REMOTE_ADDR CGI variable at the time of the subscription. It would be very difficult to get an rssCloud subscriber working in a shared hosting environment because every subscription request you make goes out on IP address A, but all of your incoming requests come in via port 80 on IP address B. For some shared web providers, the machines that make outgoing requests are also web servers, serving banner messages or redirects to sales sites. Because they are web servers, they are considered valid rssCloud REST endpoints (returning 200 OK for POST requests on some URLs).

When you put these pieces together, it becomes readily apparent that you can now subscribe your shared host’s outgoing HTTP request IP address to any number of feeds. Considering that Wordpress has 7.5 million blogs that speak rssCloud, there’s a significant number of blogs that could end up pinging the machine.

There are probably a number of other interesting vulnerabilities in this area, such as traffic that travels through a proxy, or an anonymizing service such as TOR. It may be possible to knock one of these offline by subscribing it to a large number of feeds.

The problem with rssCloud is that its subscription request only proves that you can make requests via the given IP address, not that the given IP address is willing to receive them. By adding the challenge parameter I suggested in the previous post, you can now guarantee that the endpoint is willing to receive these requests, making it much harder to subscribe an unwilling participant in the protocol.