Computers

Good-bye App Engine

I’d been running my web site for comedians LaffQ.com in Google App Engine for nearly two years. Google App Engine seemed pretty neat when it first came out: it was the only free hosting service I knew of where you could deploy dynamic Python apps (using Django no less, a framework I was already familiar with) with the promise of Google managing the backups, scaling, and provisioning.

But as time went on, the sunny optimism began to fade. Although Google supported Django, it uses a proprietary BigTable-backed database which was not compatible with Django’s object-relational wrapper (ORM). The originally unbelievably high free limits during the beta period which reduced drastically, in some cases absurdly, when the product came out of beta. Visiting my internal operations page just twice could blow out nearly the entire day’s quota because it produced counts of many tables — so each item counted towards a daily quota of 50000 “small operations”.

Developing for Google App Engine was always a pain. It was non-standard enough that a lot of libraries and tools wouldn’t work or need annoying changes. There was a single point of documentation, written in a sort of corporate Google-ese — not horrible, but not the nicest documentation I ever read. There were a lot of layers of abstraction. It was proprietary. It was slow to fetch data. The pdb debugger didn’t work very well. Getting data out of production was an ordeal. It was even hard to launch a instance on the command line, which meant on those rare occasions when I decided to do development on my Linux netbook, I spent most of my time getting the newest version of Google App Engine to work again.

I don’t exactly remember how or why, but late one Sunday evening in November I suddenly came up with the bright idea to port the Google App Engine infrastructure to “sqlite” (as I referred to it in my head), ie. to use the standard Django database back-end with the idea to deploy it to some unknown host using sqlite3. I started almost immediately, figuring at the very worst, it would just be an abandoned experiment.

It turned out to be… well, if not “surprisingly easy”, remarkably painless. By Monday afternoon, I had gotten most of the core public functionality working just fine (insert show, edit show, delete show, list shows). The data model ported straight across with, one exception being a GeoPt latitude/longitude structure, which I simply reformatted into a string containing a comma-separated pair (which is was probably stored as in Google anyway).

By 9 pm on Monday, my entire test suite was passing. The internal pages hadn’t been ported yet, nor the “post on Facebook” functionality.

I decided to work on the non-critical internal pages first, or else I wouldn’t have any excuse to not deploy (and that’s scary!). I ported these components which gave the code base a chance to “set” (like Jello™) and time for that unease associated with massive changes to dissipate somewhat.

Some changes that occurred more than once:

- adding “objects” everywhere, so Model.get becomes Model.objects.get
- changing Property to Field (ndb.StringProperty => models.CharField for example)
- ndb.KeyProperty => models.ForeignKey
- put => save
- key.get. => key.
- .query() => .objects
- required=False => null=True
- obj.key.delete() => obj.delete()
- query.order => query.order_by
- query.filter works very differently (no more fancy Google App Engine custom types based on operator overloading and deferred evaluation… fancy, but pretty opaque)

On Tuesday, I started working on the automated posts to Facebook. Not much needed to change here, but it was a bit nasty as there were only limited automated tests for this, so I had to be very cautious. Somewhere along the line, I decided I would deploy on my unlimited Dreamhost account (promo code: “RICHARD_SHARES”), which costs about $9/month and already hosts a bunch of domains. There were a few gotchas in getting the wsgi configuration working with Django (and it was tremendously difficult to debug until I hit on this idea of marshaling the requests to a file, then invoking the server by hand using the marshaled request), but this was reasonably straightforward, and I had built a local installation of Python 2.7 in the past, so I reused that.

I worked on the code to import the data. Google Data Export is another things that’s way too complex and slow, but I had done a trial run of the export on Sunday, so I used that as a testbed. I found some sample code to read the sqlite database (which is very simple) into Google Entity objects, so it was fairly simple to read properties out of the Google objects and put them into Django objects. I ended up doing it in two passes; the first pass included root objects that other objects have foreign keys to, which ensures that the second pass can refer to those already created objects without dangling foreign keys.

I waited until midnight, so the daily stats would be generated on Google, with Google data, then immediately put the site into read-only mode on Google and began the dump of data from Google. It was infuriatingly slow. It finally finished around 12:42 am, so about 40 minutes total, to produce a 21 MB sqlite database file. Finally, one little file with ALL my LaffQ data in it!

Here’s the script that converted the exported Google App Engine data to native Django objects in sqlite3: https://gist.github.com/richardkiss/4576523

I had already brought up the new site on Dreamhost, using two day old data, so it was just a matter of running the conversion tool, which read the Google sqlite database (which was essentially a BigTable dump, with one entity) and wrote out the Django sqlite database (which much more closely resembled the actual structure of data in my database). The new sqlite file was 8.7 MB. Compressed with bzip2, it was under 2 MB. That was the entirety of my web site data that took Google 40 minutes to export!!!

I copied it to Dreamhost in about five seconds, deployed it, restarted the Django app, and poked around a bit. Everything seemed to be in order and up to date, so I update DNS to point to the Dreamhost version of the web site, and waited for the change to propagate around the world (it’s funny to think about how LaffQ came up at different times for different people).

One thing I forgot is the Facebook integration required SSL. I didn’t have an SSL certificate (or a unique IP!) so I was flustered for a bit. I thought about how this used to work: it went to the laffq.appspot.com domain, the Google domain that supports SSL for free (signed with a Google appspot.com SSL certificate). Then I realized, I could just write a tiny Google App Engine app that proxied requests to https://laffq.appspot.com/ by fetching content from laffq.com. These pages are very low traffic (since they don’t really do anything), and it worked! (I had problems serving CSS to Chrome, which were resolved by making sure the content-type header was set correctly).

I’m kind of glad I didn’t remember this until it was deployed because I didn’t see the answer right away, and it might have made me not go through with it due to my tendency to know every move in advance.

All in all, the port went unbelievably well. The sqlite version is MUCH faster than Google even though memcache is no longer in the mix (I do use built-in Django caching in the same memory space as the Django app, although I’m pretty sure I could turn it off and it would still be very fast). It uses very little CPU on Dreamhost, and there are no annoyingly arbitrary quotas.

Since then, my work on LaffQ has accelerated beyond my wildest dreams. I didn’t realize just how much the Google App Engine restrictions were holding me back. Now I can back up my entire production database in just a few seconds. I can run a copy of production database locally, which gives me a much better feel for how changes will perform with production data, so I use production data in development all the time. This gives me a much better feeling about new features. Debugging is much easier. Deploying is as simple as a git push/git pull. I can look at logs. I can make tweaks in production. I can ssh to the production machine. I can diagnose problems in production. Pretty much everything about it is better. I’ve refactored, tweaked, optimized, added tests. It’s code I’m almost proud of now. (Almost.)

I saw on Hacker News the other day about how Google App Engine was down. I was happy it didn’t affect me.

Comedy
Computers

Comments (0)

Permalink

Some Whining About Video Software

I’ve been working on editing video again lately. I’m struck by the same feelings I had last time I tried to do this way-too-complex and thankless task.

I appeared on the local cable access show “Paint with Lynn”, with my friend, comedian Lynn Ruth Miller. They gave me a copy of the shows on DVD. I want to edit each 30 minute episode down to something less than ten minutes so I could upload it to YouTube and so viewers wouldn’t have the urge to kill themselves out of boredom.

First, I thought I would use Final Cut Express. Of course, they don’t let you import the .VOB file format that DVDs use (which is just a renamed mpeg-2 file) unless you spend $20 for the Apple MPEG-2 Playback Component. Why Quicktime won’t play back one of the most important file formats out of the box is beyond me. Okay, it’s not really beyond me: it probably has to do with the cost of patents. It’s still frustrating though.

So I used Handbrake to convert to MPEG-4. This take an hour or so. Well, guess what… Final Cut Express would NOT import the MPEG-4 movies produced by Handbrake. Whaaat???

So after some tests, I used Handbrake to convert to AVI. Another hour. Why Apple’s Final Cut Express supports Microsoft’s container and not Apple’s is beyond the hell out of me.

After spending hours figuring out and getting used to Final Cut, I discovered that Handbrake transcoded the audio, and I suppose due to some weird bug, shifted the levels, causing a bunch of pops in the audio. I noticed this only AFTER I had spent hours assembling all the clips.

So I transcoded AGAIN, this time requesting Handbrake to not transcode the audio, but just it pass through. That’s generally better anyway: it takes less time, and it doesn’t involve conversion, which generally causes a loss in quality. Lesson #1: Avoid transcoding whenever possible.

Finally, I had something that Final Cut could read (an AVI file) and did not have audio popping. I had created the clips by settings marks on the large clip. Luckily, I was able to just switch the underlying AVI files, and since all the time offsets were the same in the new file, I didn’t have to recreate the clips. This is the rough equivalent of changing the tablecloth without unsetting and resetting the table. But it worked. Finally something was going right.

I decided that I hate Final Cut. It’s way overkill for someone who just wants to select sub-clips from a large clip. I thought I would try iMovie for the second episode to see if it was any better.

I was ecstatic to find that iMovie claims support of importing MPEG-2 without buying any additional Quicktime components. However, you can’t just import an MPEG-2 file off your disk: it will only import MPEG-2 from cameras. What?? So you have to trick it into thinking the MPEG-2 file is in a camera hierarchy. After poking around the internet a bit, I discovered an easy way is to image the DVD with Disk Utility, then when you mount the image, iMovie figures it’s a safe unprotected DVD created by a camera (never mind that the DVD I got was unprotected in the first place). It was a huge waste of time and temporary disk space, but the trick worked, and was faster and less lossy than transcoding.

But alas, there was still for one problem: there is something up with the first chapter of the episode I was trying to import, and iMovie would not import it. Of course, Apple’s DVD Player and my hardware-based DVD player both play it just fine, so it’s mostly correct. But iMovie is more picky than it needs to be, and it throws an error. Why doesn’t iMovie use the same MPEG-2 code as DVD Player.app? Who knows? I need some way to figure out what’s wrong with the MPEG-2 file and repair it. Offhand, I know of no such utility. Maybe VLC.app will do it. It does a fabulous job reading otherwise sketchy media files. But that will take at least 15 minutes, so I’ll skip that for now.

Then there is the little matter of rendering. Final Cut Express works very slowly with clips in AVI format (using the H.264 video codec), and for some reason just will not play “unrendered” clips. I believe that “rendering” the clips converts them into an internal DV format which, although incredibly huge, is very quick to decompress so it can scroll around in them. This rendering process takes unbelievably long: it seems to be 8x real time (so one minute of video takes eight minutes to render). And this is on a machine that has absolutely no trouble playing real-time. Why would it take so much longer to render than to play live?! And why does it have to be rendered at all?

I believe the most direct process would be to convert from MPEG-2 to DV format, which I assume would not have to be rendered except during transitions. I haven’t been able to find software that will do this though besides MPEG Streamclip, which requires the $20 MPEG-2 component.

Why doesn’t Apple do more code sharing between their various video codec projects (Final Cut, which won’t read MPEG-2 or .mov files; Quicktime, which requires an upgrade to read MPEG-2 files; iMovie, which can’t read some MPEG-2 files that DVD Player has no trouble with). I suspect that legal concerns with the DMCA enter into at least one of these redundant code disasters.

Video is a living hell. Yes, even on the Mac.

Computers

Comments (3)

Permalink

Django on Dreamhost

I went to the San Francisco Django Meetup last week and met some smart, nice people.

There was some talk about host to deploying Django. There were many good things said about Slicehost. Of course, to my mind, the cheapest and easiest way to deploy is Google App Engine. I say let the good employees at Google deal with the hard work of keeping these thousands of machine online and responsive. If your GAE site goes down at 4 AM, there’s no point in waking up to take a look at what’s going on, because you can’t even log in to those Google machines, much less have root access, so you may as well… keep sleeping. Which is what I prefer to do at 4 AM.

Anyway, I’ve used Dreamhost for years to host both my blog and my personal web site, which is actually a very simple Django app. It’s met my needs perfectly; it’s a very low traffic web site, and at less than $10 a month (including ssh access!) incredibly cheap.

You don’t get root like you do on slicehost.com or other VPS, but I consider that a feature, not a bug. Do I really want to be the one responsible for keeping system software safe from the hacker attack du jour?

If you feel like sending some kickbacks my way, enter my email address as your referrer (him at richardkiss.com) or sign up here.

I expect to release my Django Dreamhost configuration as a github project Real Soon Now™.

Computers

Comments (1)

Permalink

On the Air (2)

Here’s the radio show I did on Sunday. It’s an hour long.

It starts with a three minute audio clip of a funny guy who isn’t me, so don’t be confused by that part.

"What's So Funny" radio show, Dec. 14, 2008

Comedy
Computers
Life

Comments (2)

Permalink

Free Stuff! Come ‘n Git It! (Part 2)

I’ve created a couple more git depots with some rather old code that has been written and open sourced for quite some time… but just never shared.

One is a simple C-based command-line utility to quickly fix line endings for text files. It can read and write text files that have DOS, UNIX or Mac line endings. It’s very simple and quite peppy, and does a simple check for binary files before proceeding, so you can use it with confidence. It’s called “fixle”… very quick to type, fast to use. It replaces files in place. Developed on Mac OS X, it should work on any UNIX.

http://github.com/richardkiss/fixle/tree/

Another is a pair of Core Audio utilities for Mac OS X that provide a sort of “device” for audio: speakerpipe (which lets you dump data to the speaker) and mikepipe (which dumps data from the mike).

http://github.com/richardkiss/speakerpipe-osx/tree

Ideally, the functionality in speakerpipe should be integrated into the Mac OS X build of the very useful command-line utility sox so it can play sounds on the Mac. (Hmm, some browsing of the project seems to indicate that this functionality is coming.)

Both of these were written years ago and just never released into the wild. I release them, with BSD-style licenses, with the hope that they will be useful. No warranties though suckah!

Computers

Comments (1)

Permalink

Google Code (Peanut Butter and) Jam

Last Friday, I participated in the Google Code Jam, a programming puzzle contest sponsored by Google. It was the first qualification round, and I was very happy with how I did, coming in 74th in my heat (there were three heats, and mine had almost 3000 participants).

There were three problems, each with two data sets. The last problem was really more of a math problem, which I figured would have given me more of an edge because most of the time when I cheat on my second love, computers, it’s with my first love, math. However, I couldn’t complete it in the time allotted. But I also couldn’t stop thinking about it. I eventually came up with a solution which I’ll share here.

It’s problem C in round 1A (start here), but essentially, you have to find the last three digits before the decimal point for the number (3 + √5)n where n is a whole number that can be very large (up to two billion).

This means you cannot calculate the whole thing; it would contain several hundred million digits, which would use most of RAM just to hold the representation. So you have to figure out a trick.

Here’s what I came up with (too late to submit). Let An = (3 + √5)n and Bn = (3 – √5)n. Observe that An = Xn + Yn√5 for some series X0, X1, … and Y0, Y1, … where each of Xn and Yn are whole numbers. You can show this easily by induction. You can show a similar thing for Bn; in fact, Bn = Xn – Yn√5.

This means that An+Bn = 2Xn.

Notice that (3 – √5) < 1, so Bn < 1 for n>1, and goes towards 0 very quickly. Since An = 2Xn – Bn, we can calculate An by calculating 2Xn and subtracting a “small” number… that is, the last three digits of An are the same as the last three digits of 2Xn-1.

So all we need to do is figure out Xn!

We know that X0 = 1, Y0 = 0. An+1 = An(3 + √5), so

Xn+1 + Yn+1√5 = An+1 = An(3 + √5) = (Xn + Yn√5)(3 + √5) = (3Xn+5Yn)+√5(Xn+3Yn)

so separating rational and irrational parts yields the pretty recurrence relation Xn+1 = 3Xn + 5Yn and Yn+1 = Xn+3Yn.

That means Xn+1 and Yn+1 depend only upon Xn and Yn. Since we only care about the last three digits, that means that there are only 1000*1000=a million different combinations of Xn, Yn, and thus, the series must repeat in fewer than a million iterations. It shouldn’t be too hard to find a repeat.

Here’s the Python code to find the cycle length:

def _a_b_n(n):
    if n == 0:
        return 3,1
    a,b = a_b_n(n-1)
    return ((3*a+5*b) % 1000, (a+3*b) % 1000)

CACHE={}
def a_b_n(n):
    if not CACHE.has_key(n):
    CACHE[n] = _a_b_n(n)
    return CACHE[n]

F={}
for i in xrange(int(1e6)):
    k = (Xn, Yn) = a_b_n(i)
    #print i, a, b, (2*a-1)%1000
    if F.has_key(k):
    #print "repeat: %d, %d" % (i,F[k])
    break
    F[k] = i

cycle_length = i-F[k]
print "cycle length is", cycle_length

When you run this, it takes less than a tenth of a second to figure out that the cycle length is 500, and A503, B503 = A3, B3. So now it’s easy! Calculating the first 503 terms is enough.

Here is the rest of the code:

def do_trial(f):
    n = int(f.readline())
    if n>cycle_length:
        n %= cycle_length
        n += cycle_length
    t = a_b_n(n-1)
    return (2*t[0]-1) % 1000

f = sys.stdin
count = int(f.readline())
for i in xrange(count):
v = do_trial(f)
print "Case #%d: %03d" % (i+1, v)

It runs in pretty much no time at all, thanks to the excessive caching.

Computers
Life

Comments (0)

Permalink

Big Park Dot Com

Our corporate web finally went live this week. I have been at this company for over six months, joining shortly after its inception.

So what took so long? Well, for one thing, we didn’t have a name until very recently. We were going by the temporary name “Funny Fox Games”, but didn’t want to waste any time branding us by the temporary name. We struggled over a final name for quite a while, and just a few weeks ago settled on BigPark, Inc. I actually think the name is quite good (kind of a relief… I thought some of the ideas were not that great).

So check out our quite pretty (and vague) web site at http://bigpark.com/ and see what you think. We have a few jobs openings, so if you like computers and/or games, take a look!

With a little luck, we will release our first game Real Soon Now™.

Computers
Life

Comments (0)

Permalink

“Rock Band” Makes You Feel Cool

An invariant in my personality is an obsession with precision, pedantry, and pedagogy. This perfect storm is responsible for my unusual interest and aptitude for math and computers. But this was not a conscious choice, and with the good fortune of being able to breeze through math course came the curse of nerd-dom ostricization. My tendencies in this direction are too strong to ever hope to not be a nerd. I’ve always refused to “celebrate” the nerd way. Some of it is overcompensation, some of it surprising. For example, I’ve never seen an episode of Star Trek. As I like to say, I don’t trust anyone too much into anything. I’m a nerd, but I’m not a geek. (OK, I have a blog. Sue me.)

This repudiation of the geek lifestyle has caused me to drift away from video games that interested me when I was younger. My current job has brought me back into that world though, because… well, it’s that industry. So the office has XBOX 360, Wii, a giant TV. We got the game “Rock Band” pretty much as soon as it was available.

“Rock Band” comes with a guitar controller (with buttons instead of frets), a microphone, and a drum kit. You play and sing along with real songs that you’ve probably heard on the radio, like Weezer’s “Say It Ain’t So”, and it scores you based on how accurately you follow along to the score that scrolls by. If a tone is assigned to you and you botch it, that note doesn’t play in the song, and the discordance socks a body blow to your musical memory (and makes the virtual crowd more likely to start booing).

It doesn’t take long before you actually start to get the hang of it, and surprisingly, feel like you are contributing to the songs. You really do learn certain musical skills, especially timing.

If the difficulty level is appropriate — not too easy that you get bored, but not too hard that you get frustrated (see also this Wikipedia article) — you really start to get into it, and become one with the music. You begin to understand what it might be like to be a real rock star. It’s a coolness simulator.

These effects are real. After I’ve played the game for a while, I really do feel more at ease. I feel more confident. I am more chatty with strangers.

Of course, the irony is that playing video games is kind of dorky. One might claim this it’s just a fantasy world, where people pretend to be something they’re not. But there’s no denying that it does have an immersive coolness effect on its participants’ mood. So how to reconcile this apparent contradiction?

Who cares? Cool people don’t care what other people think.

Computers
Life

Comments (4)

Permalink

Math Jesus

Today I got multiple copies of a “pump-and-dump” spam. You know, the kind where they tell you about a stock that is “going to go through the roof” which then does because enough people believe it, so it becomes a self-fullfilling prophecy. The best part is, no one even has to be convinced about the fundamentals of the stock — they just have to believe that enough other people are going to buy. Thus, the pump. Alas, the dump is a tough one. Good luck with the timing on that. But I digress.

Somehow, many copies of this spam made it through Gmail’s normally excellent spam filter. Each was from a different, made-up sender, with first and last names probably independently chosen from some list. What was cool was the name on one of the spams: “Math Jesus”.

How wonderful. I immediately claim ownership of this moniker, since the spammer is probably not aware that this pairing was made. And how fitting. I am very good at math — not the best mathematician the world has ever seen, but definitely way up there. Not a math god. But, yes, dare I say, a Math Jesus.

Thus my new screen name/nickname/band name. Math Jesus.

Gödel showed that we need an infinite number of axioms for a system that can embed the natural numbers to be complete (that is, for every statement to be provably true or false). Something along those lines.

So let me say right now: ten commandments ain’t gonna cut it. That’s just gonna be the tip of the infinite iceberg.

Computers
Life

Comments (0)

Permalink

Lazy Bones

I’ve always been proud about how lazy I am. I often say that I will do an incredible amount of work just so I can be lazy. Something along those lines happened today, and I realize that what it is is that I hate is routine busywork; so much so that I would rather do a certain quantity of creative work to avoid an equal quantity of busywork.

At my company, we have used Apple’s deprecated WebObjects 4.5, which uses Objective-C and WebScript. Years ago, WO4.5 was the most fun web development environment, but since it turned 5.0 and substituted Java for Objective-C, it’s been much less fun and it’s not used very widely outside Apple, and it’s not really admired as much as it used to be. It’s gotten worse (Java over WebScript? Get real) and other environments have gotten better (I like Django a lot right now).

Anyway, we have a specific application that has been bothering us for a long time, and it’s time to port it.

I’ve been looking at “SQLAlchemy”, a Python-based object-relational modeling tool that works with Oracle (which I hate, but that’s a story for another time). It seems using this tool will make the port from WO4.5 to Python a fair bit easier.

The first step seemed to be to create new model files that describe the object-relational mapping. Of course, the syntax for these files (pure Python, it turns out) is quite different from the syntax used in WO’s EOF object-relational mapping layer. We have a lot of objects, so rather than do it by hand, I thought I would write a tool to do it.

EOF uses a plist file format. There are Objective-C routines to read these files, but I couldn’t find any libraries to do so in Python. So I installed pyobjc so I could use one particular Objective-C call from Python. This made parsing very easy, and creating the Python model file was a piece of cake.

It’s not 100% yet, but I was quite amazed not only how quickly I could produce the translation tool, but what great lengths I would go to to be lazy.

Computers

Comments (1)

Permalink