grack.com

I had a lot of great feedback on my AppEngine post the other day. We put our trust in the AppEngine team to keep things running smoothly while we work on our app and live the rest of our lives. Today is pretty quiet around the Gripe virtual offices (aka Skype channels): it’s Thanksgiving in US and I’m getting hit by a cold too hard to do much today besides write a short post.

We had a great surprise this morning. The View episode we were featured in was in re-runs this morning and we had a huge traffic spike. Nobody noticed until about 30 minutes in, since  everything was scaling automatically and the site was still serving up as quickly as ever:

This is a whole new way to build a startup: no surprise infrastructure worries, no pager duty, no getting crushed by surprise traffic. It’s peace-of-mind scalability.

Now back to fighting my cold, without the worry of our site’s load hanging over my head. For those of you in the US, enjoy your thanksgiving and watch out for Turkey drops!

Read full post

There’s been a handful of articles critical of Google’s AppEngine that have bubbled up to the top of Hacker News lately. I’d like to throw our story into the ring as well, but as a story of a happy customer rather than a switcher.

We’ve been building out our product, Gri.pe, since last May, after pivoting from our previous project, DotSpots. The only resource available to develop Gripe in the early days was myself. I’d been playing with AppEngine on and off, creating a few small applications to get a feel for AppEngine’s strengths and weaknesses since its initial Python release. We were also invited to the early Java pre-release at DotSpots, but it would have been too much effort to make the switch from our Spring/MySQL platform to AppEngine.

My early experiments on AppEngine near its first release showed that it was promising, but was still an early release product. Earlier this year, I started work on a small personal-use aggregator that I’ve always wanted to write. I targeted AppEngine again and I was pleasantly surprised at how far the platform had matured. It was ready for us to test further if we wanted to tackle projects with it.

Shortly after that last experiment, one of our interns at DotSpots came to us with an interesting idea. A social, mobile complaint application that we eventually named Gri,pe. We picked AppEngine as our target platform for the new product, given the platforms new maturity. It also helped that as the sole developer on the first part of the project, I wanted to focus on building the application rather than spending time building out EC2 infrastructure and ops work that goes along with productizing your startup idea. I prototyped it on the side for a few months with our designer and once we determined that it was a viable product, we decided to focus more of the company’s effort on it.

There were a number of great bonuses to choosing AppEngine as well. We’ve been wanting to get out of the release-cycle treadmill that was killing us at DotSpots and move to a continuous deployment environment. AppEngine’s one-liner deployment and automated versioning made this a snap (I hope to detail our Hudson integration another blog post). The new task queue functionality in AppEngine let us do stuff asynchronously as we always wanted to do at DotSpots, but found to be awkward to automate with existing tools like Quartz. The AppEngine Blobstore does the grunt work of dealing with our image attachments without us having to worry about signing S3 requests (in fairness, we’re using S3 signed requests for our new video upload feature, but the Blobstore let us launch image attachments with a single day’s work).

When it came time for us to launch at TechCrunch 50 this year, I was a bit concerned about how AppEngine would deal with the onslaught of traffic. The folks on the AppEngine team assured me that as long as we weren’t doing anything to cause a lot of write contention on single entity groups, we’d scale just fine. And scale we did:

In the days after our launch, AppEngine hit a severe bit of datastore turbulence. There was the occasional latency spike on AppEngine while we were developing, but late September/early October was much rougher. Simple queries that normally took 50ms would skyrocket up to 5s. Occasionally they would even time out. Our application was still available, but we were seeing significant error rates all over. We considered our options at that point, and decided to stick it out.

Shortly after the rough period started, the AppEngine team fixed the issues. And shortly after that, a bit of datastore maintenance chopped the already good latencies down even further. It’s been smooth sailing since then and the AppEngine team has been committed to improving the datastore situation even more as time goes on (from here):

We didn’t jump ship on AppEngine for one rough week because we knew that their team was committed to fixing things. We’ve also had our rough weeks with the other cloud providers. In 2009, Amazon lost one our EBS drives while we were prototyping DotSpots infrastructure on it. Not just a crash with dataloss, but actually lost. The whole EBS volume was no longer available at all. We’ve also had weeks where EC2 instances had random routing issues between instances, instances lock up or get wedged with no indication of problems on Amazon’s side. Slicehost had problems with our virtual machines losing connectivity to various parts of the globe.

“Every cloud provider has problems” isn’t an excuse for any provider to get sloppy. It’s an understanding I have as the CTO of a company that bases its future on any platform. No matter which provider we choose, we are putting some faith in a third-party that they will solve the problems as they come up. As a small startup, it makes more sense for us to outsource management of IT issues than to spend 1/2 of an engineer’s time dealing with this. We’ve effectively hired them to deal with managing the hard parts of our scaleability infrastructure and they are working for a fraction of what it would cost to do this ourselves.

Putting more control in the hands of a third party means that you have to give up the feeling of being in control of every aspect of your startup. If your self-managed colo machine dies, you might be down for hours while you get your hands dirty fixing, reinstalling or repairing it. When you hand this off to Google (or Amazon, or Slicehost, or Heroku), you give up the ability to work through a problem yourself. It took some time for me to get used to this feeling, but the AppEngine team has done amazing work in gaining the trust of our organization.

Since that rough week in September, we’ve had fantastic service from AppEngine. Our application has been available 100% of the time and our page rendering times are way down. We’re committing 100% of our development resources to banging out new features for Gri.pe and not having to worry about infrastructure.

On top of the great steady-state service, we were mentioned on ABC’s The View and had a massive surge in traffic which was handled flawlessly by AppEngine. It transparently scaled us up to 20+ instances that handled all of the traffic without a sweat. In the end, this surge cost us less than $1.00:

There’s a bunch of great features for AppEngine in the pipeline, some of which you can see in the 1.4 prerelease SDK and others that aren’t publicly available yet but address many of the issues and shortcomings of the AppEngine platform.

If you haven’t given AppEngine a shot, now is a great time.

Post-script: We do still use EC2 as part of our gri.pe infrastructure. We have a small nginx instance set up to redirect our naked domain from gri.pe to www.gri.pe and deal with some other minor items. We also have EC2 boxes that run Solr to deal with some search infrastructure. We talk to Solr from GAE using standard HTTP. As mentioned in the article above, we also use S3 for video uploads and transcoding.

Follow @mmastrac on Twitter

Read full post

Neal Stephenson’s latest interactive serial novel project, The Mongoliad, launched last night at midnight. Readers subscribe to the project to receive access to the new material as it becomes available and can interact with fellow readers in the forums. They can also contribute to the book world’s ‘pedia, a wiki focused on events and characters within the story.

I had a chance to ask Neal some questions about the project:

MM: How has the format and the collaboration between yourself and the other authors of the Mongoliad affected the creative process you use for writing?

NS: The most obvious difference, of course, is that this one is collaborative, and so I have the opportunity to throw ideas around with other writers and enjoy the creative and frequently funny discussions that arise during those meetings. The whole medieval-knights-in-armor thing has, of course, been gone over pretty thoroughly by other writers dating at least as far back as Cervantes and Shakespeare, and so you might think we would be discouraged by the existing body of literature on that topic. On the contrary, though, we have been harvesting a lot of energy from our perception that we are able to produce a new twist on the theme.

As far as format is concerned, one of the big differences is that we are able to offload a certain amount of exposition and background material to ‘pedia entries and other material on the site. Normally when writing a novel that is set in an alternate world (fantasy, sf, or historical) a certain amount of energy is drained away worrying about how to incorporate that sort of explanatory content into the prose without bringing the story to a dead stop. In this project we can just keep telling the story and then supply the background in whatever way seems best; the reader can then delve into it as much or as little as it pleases them.

MM: Will the serial story of the Mongoliad end up as a book?

NS: Oh yes, we are all serious book lovers and so it won’t be a real finished project in our minds until we have seen it between the covers of a traditional printed book.

MM: Since the works of the Mongoliad aren’t permanently fixed in type, would you or other authors of the project ever consider reworking a chapter based on feedback from the community?

NS: You have hit on one of the great advantages of this format. The traditional drawback of serialized fiction is that there is no way to go back to chapter 1 and change something that might make chapter 35 much better. With this system we can simply update the chapter and push the new version out to the subscribers’ devices. We have already taken advantage of that to produce several successive rewrites of the chapter in which Haakon fights Zug in the arena.

MM: The Mongoliad is going to be a social experience. When people interact with the Mongoliad, do you expect they will be interacting as themselves with their real names, as themselves behind an alias or as a “role-play” character?

NS: At the moment it’s a discussion forum in which people log in under persistent identities that we assume match up, more or less, to their true identities. The possibility of role playing suggests a more gamelike milieu, and we are definitely thinking about games, but we don’t have anything up and running yet.

MM: At Dorkbot-SF, Jeremy mentioned that the book may evolve according to feedback from the community, or potentially adopt certain bits of well-written fan-faction to varying degrees. How will the project deal with copyright and attribution in the case where material may bubble up from the community?

NS: I’m going to leave that one to Jeremy but the general answer is that if we notice someone who has a knack for fan fiction we’ll probably strike a deal with them that will cover the intellectual property transaction in a way that is clean and mutually understood.

MM: The Mongoliad is experienced and produced differently than existing books today and gives us a hint of what the future of reading may be. What would your ideal world of publishing be 20 years from now- the writing process, the publishing model and the technology used by readers?

NS: We’re trying to build toward that ideal world with this project, and our vision of it changes a little every day as we learn new things. Any future in which people do a lot of reading and writers are able to support themselves by writing seems like a good future to me!

MM: A few days ago, the Oxford English Dictionary’s owner stated that it would likely no longer be printed again. What do you think will happen to the printed word over the next few decades?

NS: With devices such as the Kindle and the iPad doing so well, it seems that the printed word is adapting itself quite well to the electronic world, and so I am actually much more optimistic about this than I was a few years ago before such devices were invented. There is clearly a market for long-form, fully immersive experiences of a sort that only literature can provide, and the fact that such material can be generated by a single person working alone with virtually no equipment means that experimentation is easy and that a colossal range of different themes and experiences can always be made available to readers. The only cause for concern, in my mind, is that writers will go broke as the result of people violating their copyrights, but we hope that we can find a business model in PULP that will make it a no-brainer for readers to subscribe, and show their appreciation for their favorite writers by sending a little bit of coin their way.

Read full post

As you might have seen on Wired or Engadget, we were poking around on the pre-release EVO from Google I/O and managed to get root access to it before it had been shipped. You might remember my blurrycam video of the event:

We didn’t mention how we did it at the time, only that we exploited a serious vulnerability and recommended other users root their phones. Now that Sprint’s patch has been out in the wild for a while and everyone has updated, we’re releasing more details on what the security vulnerability is.

The first step of rooting any phone is taking stock of what’s on the device and doing a cursory check of whether you can use it to elevate permissions. This means running a shell on the device and poring over ls -l in every directory.

On the EVO I received from I/O, there was a file named “skyagent” in the /system/bin directory of the device. This file was also present in the latest, shipped firmware in Sprint HTC Hero phones. When we started poking at it, we discovered that not only would it let us get root, but it was effectively a backdoor into the device that allowed external users to peek and poke input, dump the contents of the screen and run arbitrary programs. Not only that, but the program listened on every interface, meaning external users could spy remotely on the device. We weren’t able to determine if the program could be launched remotely, but once it was launched, it was a very effective back door.

We disclosed this to Sprint quickly after finding it. They were very responsive and rolled it into a patch that they released alongside the EVO’s launch.

We’re still not sure what this program was doing on the device at launch. One theory is that it’s a test program, designed to provide input and output for automated testing on real devices. Another theory is that it’s a law-enforcement or three-letter-agency wiretap program for capturing communication. Yet another is that it was placed there by a rogue employee as a plain, malicious backdoor. There’s not enough evidence to determine which (if any) of the theories is correct and Sprint hasn’t disclosed anything.

Here’s an excerpt from our coming vulnerability disclosure (thanks to rpearl for turning our internal disclosure into something more readable):

The binary is executable by any user; no authentication or privileges are necessary. Further, during the program’s initialization, there are numerous instances in which a buffer overflow can overwrite stack or bssmemory; similarly, the program passes user controlled arguments unsanitized as a format string to a sprintf, also leading to memory being overwritten. We believe that these can only be exploited to the point of a denial of service, not to the end of arbitrary code execution. This appears to be by chance, not by design.

However, the security vulnerabilities present in skyagent are of less cause for concern than the purpose of the program. It appears that the binary was designed as a backdoor into the phone, allowing remote control of the device without the user’s knowledge or permission. When the program is invoked, it listens for connections over TCP (by default, port 12345, on all interfaces, including the 3G network!) that accepts a fixed set of commands. These commands appear to be authenticated only by a fixed “magic number”; the commands are neither encrypted on the way to the device or on the way back. The commands that we have knowledge of at this time include:

  • sending and monitor user tap and drag input (“PentapHook”),
  • sending key events (“InputCapture”),
  • dumping the framebuffer (“captureScreen”),
  • listing processes (“GetProc”),
  • rebooting the device immediately,
  • and executing arbitrary shell commands as root (“LaunchChild”)

Here’s the paper that Joshua Wise typed up from the analysis we did, describing the backdoor in more detail: Skyagent Protocol Description

Read full post

Apparently he denied this remark in messages to Michael Geist (and the video posted on balancedcopyrightforcanada.ca conveniently omits it):

Read full post