Category Archives: Computers

Get Your BCash in Under Ten Minutes with pycoin

BCash, or Bitcoin Cash is a hard fork of Bitcoin as of August 1, 2017, so it’s pretty new. Recently, I’ve been working on a big refactor of pycoin to make it more flexible in handling altcoins. Creating BCash transactions seems to work now.

If you had bitcoin on August 1, you now have bcash. Here’s how you can create transactions with BCash with nothing but the private keys and a few calls to web services.

First, switch to a throwaway environment. I like to use a RAM disk.

On Mac OS X:

$ diskutil erasevolume HFS+ RAMDISK `hdiutil attach -nomount ram://81920`
$ cd /Volumes/RAMDISK/

Set up your service providers and a place to cache the source transactions. The service providers will be used to get spendables for the addresses you’re querying, as well as the source transactions.

$ export PYCOIN_BCH_PROVIDERS=insight:http://blockdozer.com/insight-api
$ export PYCOIN_CACHE_DIR=./tx-cache

This example will query blockdozer.com to get spendables and transactions.

Copy all your relevant WIFs to a file here. It’s okay for the file to have WIFs that aren’t used, but you will obviously need all the WIFs you wish to spend from.

$ cat > wifs.txt
(paste to terminal, one WIF per line)

Alternatively, you can use a GPG-encrypted file.

$ gpg -e > wifs.gpg
(specify recipients, then paste to terminal, one WIF per line)

We’ll be using python 3. Install pycoin into a virtual environment.

$ python3 -m venv env
$ . env/bin/activate
$ pip install -e [email protected]:richardkiss/pycoin.git@57b508d5208f98a304b9bd3a491c3f0049aed96b#egg=pycoin

Verify that an address has coins to spend from.

$ tx -n BCH -i 16NgXiMwRimMcSXLTf4KFwoW968btQ1GmZ
62263e20a70ec98cd52452ffca6f5c54fa02a45cc0a70aca14b99f2445407678/0/76a9143af25c9925ad4d20e1df648a82268d44445430ec88ac/1990000/0/0/0
b0011baf07830ec04d5adb66b1503ad6c9498dfe6c5590f05c8226c9f83fa0df/0/76a9143af25c9925ad4d20e1df648a82268d44445430ec88ac/1185613/0/0/0

Create an unsigned transaction that contains just the inputs.

$ tx -n BCH -a -i 16NgXiMwRimMcSXLTf4KFwoW968btQ1GmZ -o inputs.bin
warning: transaction has no outputs

Replace the address above with the address you want to send coins from.

Add an output. I’m sending all coins to 1HeJ94rPmgSWbPTQqAyB2Vp7ZDqyPvyXsz. You should use your own address.

$ tx -n BCH -a inputs.bin 1HeJ94rPmgSWbPTQqAyB2Vp7ZDqyPvyXsz -o utx.bin
all incoming transaction values validated

Sign the transaction, using BCH rules.

$ tx -n BCH utx.bin -f wifs.gpg -o tx.hex
signing...
all incoming transaction values validated

You could do all this in one line (fetch the inputs, add the output, sign), as follows:

$ tx -n BCH `tx -n BCH -i 16NgXiMwRimMcSXLTf4KFwoW968btQ1GmZ | xargs` 1HeJ94rPmgSWbPTQqAyB2Vp7ZDqyPvyXsz -f wifs.gpg

but it’s long and a bit confusing, and causes word wrap in the blog.

View the transaction. Note that the transaction does NOT validate on the bitcoin (BTC) network. This is due to the replay protection in BCH (the hash type of the signature is 0x41, which is invalid in BTC).

$ tx -a -n BTC tx.hex
(lots of output)

It does validate on the BCH network.

$ tx -a -n BCH tx.hex
(lots of output)

Add -d to disassemble it (just for fun).

$ tx -ad -n BCH tx.hex
(lots of output)

View the hex.

$ cat tx.hex
0100000002dfa03ff8c926825cf090556cfe8d49c9d63a50b166db5a4dc00e8307af1b01b0000000008b483045022100879c1b9cb199865bb77892b3ccdddc1f52102aa3da3024556f864e34d380108e02205b3f17c412df05ebb7984f472dcbe574fb0472bee3fa9ac3032e90ab05e33691414104712ace3646c43b8f03662cd00d9b0cc24c17b530484bd4c377ea61dd044eca8227fb4415762fc4af5c3bc979a0d64e8d7348e6bab376efab527df65feebc634dffffffff78764045249fb914ca0aa7c05ca402fa545c6fcaff5224d58cc90ea7203e2662000000008b483045022100fb13941b3d32aa33139a1d73da7affdcb805a5e435ec307daaf3a9497c8b549e02205df9d8c729af25429b8c09940eab858324c0e91f7992961d8d1586f5b3694a48414104712ace3646c43b8f03662cd00d9b0cc24c17b530484bd4c377ea61dd044eca8227fb4415762fc4af5c3bc979a0d64e8d7348e6bab376efab527df65feebc634dffffffff01ad4d3000000000001976a914b68fb5c98ece06e3f462fbaa7fc8cd559860f2ec88ac00000000

Copy the hex transaction to the clipboard. Head to http://bcc.blockdozer.com/insight/tx/send to paste it, and send it off!

Ditch the RAM Disk.

$ cd # can't unmount while we're in the directory
$ hdiutil unmount /Volumes/RAMDISK/

Calculating Fibonacci Numbers, Quickly and Exactly

The well-known Fibonacci series \(F_n\) can be defined as follows:

\(F_n =
\begin{cases}
0 & n = 0 \\
1 & n = 1 \\
F_{n-2} + F_{n-1} & n \ge 2\\
\end{cases}\)

Let’s use a few facts about matrices to find a quick way to calculate terms in this famous series.

Lemma

Let \(A = \begin{bmatrix}
0 & 1 \\
1 & 1
\end{bmatrix}\). Then \(A^n = \begin{bmatrix} F_{n-1} & F_n \\ F_n & F_{n+1} \end{bmatrix}\).

Proof

The \(n = 1\) case follows immediately from the definitions of \(A \text{ and } F_n\).

Suppose the statement is true for n. Then
$$\begin{align}
A^{n+1} & = A^n A \\
& = \begin{bmatrix} F_{n-1} & F_n \\ F_n & F_{n+1} \end{bmatrix} \begin{bmatrix} 0 & 1 \\ 1 & 1 \end{bmatrix} \\
& = \begin{bmatrix} F_n & F_{n-1} + F_n \\ F_{n+1} & F_n + F_{n+1} \end{bmatrix} \\
& = \begin{bmatrix} F_n & F_{n+1} \\ F_{n+1} & F_{n+2} \end{bmatrix}
\end{align}$$

And our result follows by induction. QED

Now, notice that

\(\begin{align}
\begin{bmatrix} F_{2n-1} & F_{2n} \\ F_{2n} & F_{2n+1} \end{bmatrix} & = A^{2n} \\
& = A^n A^n \\
& = \begin{bmatrix} F_{n-1} & F_n \\ F_n & F_{n+1} \end{bmatrix} \begin{bmatrix} F_{n-1} & F_n \\ F_n & F_{n+1} \end{bmatrix} \\
& = \begin{bmatrix} F_{n-1}^2 + F_n ^ 2 & F_{n-1} F_n + F_n F_{n+1} \\ F_{n-1} F_n + F_n F_{n+1} & F_n^2 + F_{n+1} ^ 2 \end{bmatrix}
\end{align}\)

Using this identity, we can write \(F_{2n}\) and \(F_{2n+1}\) in terms of \(F_n\) and \(F_{n+1}\).

\(\begin{align}
F_{2n} & = F_{n-1} F_n + F_n F_{n+1} \\
& = (F_{n+1} – F_n) F_n + F_n F_{n+1} \\
& = F_n F_{n+1} – F_n^2 + F_n F_{n+1} \\
& = 2 F_{n+1} F_n – F_n ^ 2
\end{align} \)

\(F_{2n+1} = F_{n}^2 + F_{n+1}^2\)

We now have a way of calculating \(F_{2n}\) and \(F_{2n+1}\) by calculating only a few of the smaller terms in the sequence. This yields some wildly efficient Python code:

def fib2(N):
    if N == 0: return (0, 1)
    half_N, is_N_odd = divmod(N, 2)
    f_n, f_n_plus_1 = fib2(half_N)
    f_n_squared = f_n * f_n
    f_n_plus_1_squared = f_n_plus_1 * f_n_plus_1
    f_2n = 2 * f_n * f_n_plus_1 - f_n_squared
    f_2n_plus_1 = f_n_squared + f_n_plus_1_squared
    if is_N_odd:
        return (f_2n_plus_1, f_2n + f_2n_plus_1)
    return (f_2n, f_2n_plus_1)

def fib(N):
    return fib2(N)[0]

And it’s fast! It’s \(O(\log N)\) in fact:

$ pypy -m timeit -s 'import expfib' 'expfib.fib(500000)'
10 loops, best of 3: 22.4 msec per loop

Compare to the naïve, iterative version, which is \(O(N)\):

def fib2_linear(N):
    f_n, f_n_plus_1 = (0, 1)
    while N > 0:
        N -= 1
        f_n, f_n_plus_1 = f_n_plus_1, f_n + f_n_plus_1
    return (f_n, f_n_plus_1)

def fib_linear(N):
    return fib2_linear(N)[0]

$ pypy -m timeit -s 'import myfib' 'myfib.fib_linear(500000)'
10 loops, best of 3: 2.76 sec per loop

This post was inspired by a post by Lee Phillips and as an excuse to play around with MathJax. The best guide I found for getting started is on Stack Exchange.

Messing with Bitcoin Keys and Addresses

The “bu” tool is obsolete, which makes this post not-so-useful. Look at this file instead.

The command line utility “bu” (for “Bitcoin utilities”) is included with my Python-based pycoin library. This utility makes it easy to deal with Bitcoin private keys and addresses in their native and various intermediate formats. Let’s go through some examples.

The most basic form of a Bitcoin private key is simply an integer between 1 and 115792089237316195423570985008687907852837564279074904382605163141518161494336 ≅ 1.15e77 (inclusive). That’s it! This integer is a “secret exponent”, because generating the public key involves exponentiation, and there is no known way to go from the public key to the secret exponent.

Let’s take a look at the very first private key, also known as “1”.


$ bu 1
secret exponent: 1
  hex:           1
WIF:             KwDiBf89QgGbjEhKnhXJuH7LrciVrZi3qYjgd9M7rFU73sVHnoWn
  uncompressed:  5HpHagT65TZzG1PH3CSu63k8DbpvD8s5ip4nEB3kEsreAnchuDf
public pair x:   55066263022277343669578718895168534326250603453777594175500187360389116729240
public pair y:   32670510020758816978083085130507043184471273380659243275938904335757337482424
  x as hex:      79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798
  y as hex:      483ada7726a3c4655da4fbfc0e1108a8fd17b448a68554199c47d08ffb10d4b8
y parity:        even
key pair as sec: 0279be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798
  uncompressed:  0479be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798\
                   483ada7726a3c4655da4fbfc0e1108a8fd17b448a68554199c47d08ffb10d4b8
hash160:         751e76e8199196d454941c45d1b3a323f1433bd6
  uncompressed:  91b24bf9f5288532960ac687abb035127b1d28a5
Bitcoin address: 1BgGZ9tcN4rm9KBzDn7KprQz87SZ26SAMH
  uncompressed:  1EHNa6Q4Jz2uvNExL497mE43ikXhwF6kZm

You can see from blockchain.info that the addresses corresponding to this private key (1BgGZ9tcN4rm9KBzDn7KprQz87SZ26SAMH and 1EHNa6Q4Jz2uvNExL497mE43ikXhwF6kZm) are used a lot in tests. Of course, neither has any funds in it (well, at least not at this time), since draining the funds is as simple as entering one of the WIF values above into a Bitcoin client.

There is a bunch of information here. The secret exponent is displayed in decimal and in hex.

The corresponding WIF ("wallet import format") key is displayed, both in compressed and uncompressed format; with this information, you can import the corresponding bitcoin address into your client. Note that the WIF simply contains the exponent encoded using "hashed base 58".

The "hashed base 58" encoding is used to represent an integer with a checksum for validity. A 32-bit checksum is appended to the binary form of the integer, forming another integer. This integer is then represented in base 58 using the alphabet of all digits and all letters of the upper and lower case English alphabet except 0, o, O and l (presumably left out because of potential confusion).

So encoding the WIF in this format really provides no additional (non-redundant) information beyond the secret exponent.

The public pair x and y correspond to the ECDSA (elliptical curve digital signature algorithm) public key that is used to verify digital signatures. Bitcoin clients use public keys to validate that transactions are signed by an entity that has knowledge of the corresponding secret exponent. The x, y value is on the elliptical curve used by bitcoin. In other words

y*y = x*x*x + 7 (mod P)

where P = 0xfffffffffffffffffffffffffffffffffffffffffffffffffffffffefffffc2f

You can check this easily in Python:


$ python
Python 3.3.0 (default, Mar 21 2013, 20:48:16) 
[GCC 4.2.1 Compatible Apple Clang 4.0 ((tags/Apple/clang-421.0.60))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> p = 0xfffffffffffffffffffffffffffffffffffffffffffffffffffffffefffffc2f
>>> y = 32670510020758816978083085130507043184471273380659243275938904335757337482424
>>> x = 55066263022277343669578718895168534326250603453777594175500187360389116729240
>>> (x*x*x+7) % p
32748224938747404814623910738487752935528512903530129802856995983256684603122
>>> (y * y) % p
32748224938747404814623910738487752935528512903530129802856995983256684603122
>>> print "ta da!"
ta da!

For a given x value, you can rewrite as y = sqrt(x^3+7) (mod P). Since numbers have two square roots even in a finite field, there are two values y0 and y1 that satisfy this equation, where y1 = P - y0. Since P is odd, exactly one of y0 and y1 is even, and the other is odd. In other words, with x and knowledge of whether y is even or odd, we can figure out the value for y. (This is how compressed keys work... they include the value for x along with a boolean indicating even or odd rather than the full value for y.)

The SEC ("Standards for Efficient Cryptography") format provides an alternate way of encoding the public key. This is the internal format that Bitcoin uses in transaction signatures to encode public keys. There is an uncompressed format, which has a prefix of a single 04 byte, followed by the x and y coordinates, and a compressed format, which has a prefix of 02 or 03 depending upon whether the y coordinate is even or odd, followed by the x coordinate.

The hash160 value is the ripemd160 hash of the sha256 hash of the bytestream of the sec version of the key.


$ python
Python 2.7.2 (default, Oct 11 2012, 20:14:37) 
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import binascii, hashlib
>>> sec = "0279be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798"
>>> binascii.hexlify(hashlib.new("ripemd", hashlib.sha256(binascii.unhexlify(sec)).digest()).digest()) 
'751e76e8199196d454941c45d1b3a323f1433bd6'

The Bitcoin address is the hashed base 58 representation of the hash160 value.

The bu utility will accept input in nearly any format, automatically determining the input type, and display output of all values that can calculated. (Obviously if you enter a Bitcoin address, you won't get the corresponding WIF!)


$ bu 55066263022277343669578718895168534326250603453777594175500187360389116729240,32670510020758816978083085130507043184471273380659243275938904335757337482424
public pair x:   55066263022277343669578718895168534326250603453777594175500187360389116729240
public pair y:   32670510020758816978083085130507043184471273380659243275938904335757337482424
  x as hex:      79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798
  y as hex:      483ada7726a3c4655da4fbfc0e1108a8fd17b448a68554199c47d08ffb10d4b8
even
key pair as sec: 0279be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798
  uncompressed:  0479be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798\
                   483ada7726a3c4655da4fbfc0e1108a8fd17b448a68554199c47d08ffb10d4b8
hash160:         751e76e8199196d454941c45d1b3a323f1433bd6
  uncompressed:  91b24bf9f5288532960ac687abb035127b1d28a5
Bitcoin address: 1BgGZ9tcN4rm9KBzDn7KprQz87SZ26SAMH
  uncompressed:  1EHNa6Q4Jz2uvNExL497mE43ikXhwF6kZm

The input is determined to be x & y coordinates.


$ bu 1BgGZ9tcN4rm9KBzDn7KprQz87SZ26SAMH
hash160:         751e76e8199196d454941c45d1b3a323f1433bd6
Bitcoin address: 1BgGZ9tcN4rm9KBzDn7KprQz87SZ26SAMH

The input here is determined to be a Bitcoin address. The only thing that it can be converted to is a hash160.

Let's try one more example using the WIF from https://en.bitcoin.it/wiki/WIF :


$ bu 0C28FCA386C7A227600B2FE50B7CAE11EC86D3BF1FBE471BE89827E19D72AA1D
secret exponent: 5500171714335001507730457227127633683517613019341760098818554179534751705629
  hex:           c28fca386c7a227600b2fe50b7cae11ec86d3bf1fbe471be89827e19d72aa1d
WIF:             KwdMAjGmerYanjeui5SHS7JkmpZvVipYvB2LJGU1ZxJwYvP98617
  uncompressed:  5HueCGU8rMjxEXxiPuD5BDku4MkFqeZyd4dZ1jvhTVqvbTLvyTJ
public pair x:   94473386280621915394287615869907363252910868562986308188178980306950346138716
public pair y:   97844737324952875321726721826250953720280326724356638601167669885622888745738
  x as hex:      d0de0aaeaefad02b8bdc8a01a1b8b11c696bd3d66a2c5f10780d95b7df42645c
  y as hex:      d85228a6fb29940e858e7e55842ae2bd115d1ed7cc0e82d934e929c97648cb0a
y parity:        even
key pair as sec: 02d0de0aaeaefad02b8bdc8a01a1b8b11c696bd3d66a2c5f10780d95b7df42645c
  uncompressed:  04d0de0aaeaefad02b8bdc8a01a1b8b11c696bd3d66a2c5f10780d95b7df42645c\
                   d85228a6fb29940e858e7e55842ae2bd115d1ed7cc0e82d934e929c97648cb0a
hash160:         d9351dcbad5b8f3b8bfa2f2cdc85c28118ca9326
  uncompressed:  a65d1a239d4ec666643d350c7bb8fc44d2881128
Bitcoin address: 1LoVGDgRs9hTfTNJNuXKSpywcbdvwRXpmK
  uncompressed:  1GAehh7TsJAHuUAeKZcXf5CnwuGuGgyX2S

Pretty snazzy, eh?

Command-line Bitcoin Transactions

The “spend” tool is obsolete, which makes this post not-so-useful. Look at the “tx” tool mentioned in this post instead.

Creating a signed transaction is the holy grail of Bitcoin. Without this functionality, a Bitcoin client can’t really be called a client. Being able to sign a transaction though… wow! The whole Bitcoin world is at your fingertips!

OK, that may be a bit of an overstatement… but it’s still pretty neat stuff.

My pycoin project features a Python command-line script “spend” that will let you generate a standard transaction that reassigns coins from one set of addresses to another set. It’s obviously not nearly as easy as using a GUI app to spend your bitcoin. But it is very simple-to-follow sample code that you can use as a template should you want to create your own Bitcoin transactions programmatically.

To create a transaction, you need the coin sources. These are the TxOut portions of the transactions that assigned bitcoin to an address. The “spend” script takes bitcoin addresses as input, and queries the blockchain.info web site to get the list of unclosed spendable transactions. This script “spends” all your bitcoin on the addresses you give. No problem though: you can send coins you want to keep (the “change”) to an address you control.

Here’s an example blockchain.info query:

http://blockchain.info/unspent?active=12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX

(If Satoshi ever gets around to spending all these coins, this URL will throw a 500.)

Of course, it’s not enough to have the sources for the coins; you need the private key. Although pycoin deals with this private key in its native “secret exponent” numerical form, the spend script expects WIF format. So you need to create a file with the WIF. How you get the WIF depends on how you generated the address. In Electrum, right-click on the address and choose “Private Key”, and the WIF will be displayed.

For maximal security, you might think about creating this WIF file on a RAM disk. On Mac OS X,

diskutil erasevolume HFS+ RAMDISK `hdiutil attach -nomount ram://2048`

and it will mount in /Volumes/RAMDISK.

Create a “wifs” text file, and add the WIF elements, one per line. You can add extraneous WIF with no worries; just make sure you have a WIF entry for every source address so the “spend” script has the information it needs to sign the outgoing transactions.

cat >> /Volumes/RAMDISK/wifs
(paste WIF)

Since it’s on a RAM disk, you don’t have to worry about someone scraping your unused drive sectors later. Unless that RAM gets paged to disk. Oh no, a security rabbit hole!! HELP

You also need to specify where you want the coins to go. That’s a simple list of (bitcoin address, coin value) pairs.

Those three elements — coin source, private keys, coin destination — are all you need to create a transaction.

$ spend -s 1BHeaS4vn9NsA6Fi4KFYREGzYmdhdZuJhH -f /Volumes/RAMDISK/wifs -d 15jpiqB2AHb2iwi83KJNxzohouxd1PEDyu/0.04476
transaction fee: 0 BTC
warning: transaction fee lower than (casually calculated) expected value of 0.00010000 BTC, transaction might not propogate
copy the following hex to http://blockchain.info/pushtx to put the transaction on the network:

0100000001068ded23b986c6fc01106596ccbabfc157c850569359ea082ab2e813f277142201\
0000008b483045022100ef5b8504a227f967a7a7c2af55e79f9886e4f9182027745fcdb7fcc4\
b8afdf830220421704dd397cee26e1634e458b1678d80c37cafdf13a0f41a6dd98bff350d9f9\
014104430ba52cd1a88b7049295ceee2e6576af7211e8a9b01b70e572ff70f4718699a64b4c8\
ea0dd3ddb3c7dab0ee53dcc4e70c6913ee0cf904853da94422b5c1b289ffffffff01604c4400\
000000001976a91433f9cf2774a90c53a3e55e5e8bbb5cb3d1090a9988ac00000000

The spend script also displays how much bitoin is unaccounted for in outputs. This amount becomes a transaction fee for the block.

The script generates the transaction and displays it as hex on the screen. It does not send it to the network. You can paste the hex onto blockchain.info to get the transaction on the network and into a block.

Note that if you generate what seems like should be the exact same transaction again, you will (hopefully) get a different hex output. That’s because part of the process of signing a transaction uses a randomly generated value K. If the value for K is known, observers can work backwards to figure out the secret key from the signature generated. In fact, if you use the same value for K to sign two different transactions, the secret key can be recovered. So it’s important that this K value be generated securely, using a cryptographically-safe random number generator.

On BIP0032 and Bitcoin Deterministic Wallets

BIP0032 defines a way to create a hierarchical deterministic wallet (that is, a way to create an entire tree of Bitcoin addresses and private keys) through a tree of wallet key nodes.

Each node has a public and private key associated with it, which can be displayed as a Bitcoin address and a WIF string. But each node also has additional entropy information, called the “chain code”, which gets fed back into the algorithm that generates the children, so revealing even the WIF doesn’t give enough information to reveal the children.

A node can be stripped of private key information, yielding a public key node. These nodes can only generate the Bitcoin address, and not the WIF. But they can still generate half of the child nodes. But only the public key node versions. So once you strip out the private-ness from a node, it’s gone forever.

Each node element can be represented by a 111-digit base58 number that looks like this:

xprv9s21ZrQH143K3QTDL4LXw2F7HEK3wJUD2nW2nRk4stbPy6cq3jPPqjiChkVvvNKmPGJxWUtg6LnF5kejMRNNU3TGtRBeJgk33yuGBxrMPHi
xpub661MyMwAqRbcFtXgS5sYJABqqG9YLmC4Q1Rdap9gSE8NqtwybGhePY2gZ29ESFjqJoCu1Rupje8YtGqsefD265TMg7usUDFdp6W1EGMcet8

Keys for the main network start with “xprv” (private) or “xpub” (public).

A node has 2^32 children, enumerated 0, 1, 2 .. 4294967295. Children with index at least 2^31=2147483648=0x80000000 are derived using “prime” or “private key” derivation, and can only be generated by the private wallet key. We use the shortcut 2p or 2’ to indicate child number 2+0x80000000.

A “key path” is a route down the tree. It’s a “/”-separated list of numbers, where each number can optionally have a trailing p or ’ character to indicate “prime”. (Typing “p” is much easier than “’” which needs to be escaped or quoted in the shell.)

Example key paths: “1”, “0/2p”, “0p/1000/5”, “0/0/0/0/0/37”.

Reading the BIP0032 is a hard slog. Maybe some examples will make this clearer. I’ve created a script for now called genwallet (I know, I know, I need a better name) that lives in my pycoin project. Create a virtualenv and install it. This has been tested with Python 3.3 but should work in Python 2.7 too. There may be minor discrepencies in what you see here and what you see on your terminal if you follow along, as this project has been undergoing heavy changes lately. Forgive me.

$ virtualenv pycoin.env
$ source pycoin.env/bin/activate
$ pip install pycoin
Downloading/unpacking pycoin
  Downloading pycoin-0.15.tar.gz
  Running setup.py egg_info for package pycoin

Installing collected packages: pycoin
  Running setup.py install for pycoin

    Installing genwallet script to (...)/pycoin.env/bin
Successfully installed pycoin
Cleaning up...

You create the top node of a tree by feeding it entropy.

$ head -c200 /dev/random | genwallet -
xprv9s21ZrQH143K3wqna1KnwoSH3reKVVp68cAuJ4izQMMe7VvzBcFdnxYvtMifigMDynsTzCSiLvCt2ksKYrSY4m5z2AQc7g7vZL6uvdaSiBn

Your results may vary. Hopefully.

Let’s use a known key so we can check our results. BIP0032 has some test vectors. Let’s try the first one.

$ python -c 'import binascii; open("master-private-key-entropy", "w+b").write(binascii.unhexlify("000102030405060708090a0b0c0d0e0f"))'
$ hd master-private-key-entropy
00000000  00 01 02 03 04 05 06 07  08 09 0a 0b 0c 0d 0e 0f  |................|
00000010

This is the initial entropy for test vector #1. Don’t use any of these key address for real storage of Bitcoin, since the private keys are all over the internet. All right? Good.

$ genwallet master-private-key-entropy > master-private-key
$ cat master-private-key 
xprv9s21ZrQH143K3QTDL4LXw2F7HEK3wJUD2nW2nRk4stbPy6cq3jPPqjiChkVvvNKmPGJxWUtg6LnF5kejMRNNU3TGtRBeJgk33yuGBxrMPHi

It matches the test vector! How about the public key? Use -s to pass in the key path. We use a trick here that the “.pub” suffix means “strip down to public key only by stripping out secret exponent information”. And that an empty key path stays on the current node. (Ugh. Need to add a -P flag.)

$ genwallet -f master-private-key -s .pub > master-public-key
$ cat master-public-key
xpub661MyMwAqRbcFtXgS5sYJABqqG9YLmC4Q1Rdap9gSE8NqtwybGhePY2gZ29ESFjqJoCu1Rupje8YtGqsefD265TMg7usUDFdp6W1EGMcet8

That matches the test vector too! Let’s generate the rest of them.

$ genwallet -f master-private-key -s 0p.pub
xpub68Gmy5EdvgibQVfPdqkBBCHxA5htiqg55crXYuXoQRKfDBFA1WEjWgP6LHhwBZeNK1VTsfTFUHCdrfp1bgwQ9xv5ski8PX9rL2dZXvgGDnw
$ genwallet -f master-private-key -s 0p
xprv9uHRZZhk6KAJC1avXpDAp4MDc3sQKNxDiPvvkX8Br5ngLNv1TxvUxt4cV1rGL5hj6KCesnDYUhd7oWgT11eZG7XnxHrnYeSvkzY7d2bhkJ7
$ genwallet -f master-private-key -s 0p/1.pub
xpub6ASuArnXKPbfEwhqN6e3mwBcDTgzisQN1wXN9BJcM47sSikHjJf3UFHKkNAWbWMiGj7Wf5uMash7SyYq527Hqck2AxYysAA7xmALppuCkwQ
$ genwallet -f master-private-key -s 0p/1
xprv9wTYmMFdV23N2TdNG573QoEsfRrWKQgWeibmLntzniatZvR9BmLnvSxqu53Kw1UmYPxLgboyZQaXwTCg8MSY3H2EU4pWcQDnRnrVA1xe8fs
$ genwallet -f master-private-key -s 0p/1/2p.pub
xpub6D4BDPcP2GT577Vvch3R8wDkScZWzQzMMUm3PWbmWvVJrZwQY4VUNgqFJPMM3No2dFDFGTsxxpG5uJh7n7epu4trkrX7x7DogT5Uv6fcLW5
$ genwallet -f master-private-key -s 0p/1/2p
xprv9z4pot5VBttmtdRTWfWQmoH1taj2axGVzFqSb8C9xaxKymcFzXBDptWmT7FwuEzG3ryjH4ktypQSAewRiNMjANTtpgP4mLTj34bhnZX7UiM
$ genwallet -f master-private-key -s 0p/1/2p/2.pub
xpub6FHa3pjLCk84BayeJxFW2SP4XRrFd1JYnxeLeU8EqN3vDfZmbqBqaGJAyiLjTAwm6ZLRQUMv1ZACTj37sR62cfN7fe5JnJ7dh8zL4fiyLHV
$ genwallet -f master-private-key -s 0p/1/2p/2
xprvA2JDeKCSNNZky6uBCviVfJSKyQ1mDYahRjijr5idH2WwLsEd4Hsb2Tyh8RfQMuPh7f7RtyzTtdrbdqqsunu5Mm3wDvUAKRHSC34sJ7in334
$ genwallet -f master-private-key -s 0p/1/2p/2/1000000000.pub
xpub6H1LXWLaKsWFhvm6RVpEL9P4KfRZSW7abD2ttkWP3SSQvnyA8FSVqNTEcYFgJS2UaFcxupHiYkro49S8yGasTvXEYBVPamhGW6cFJodrTHy
$ genwallet -f master-private-key -s 0p/1/2p/2/1000000000
xprvA41z7zogVVwxVSgdKUHDy1SKmdb533PjDz7J6N6mV6uS3ze1ai8FHa8kmHScGpWmj4WggLyQjgPie1rFSruoUihUZREPSL39UNdE3BBDu76

The -i flag dumps out a bunch of extra info.

$ genwallet -f master-private-key -s 0p/1/2p/2 -i
xprvA2JDeKCSNNZky6uBCviVfJSKyQ1mDYahRjijr5idH2WwLsEd4Hsb2Tyh8RfQMuPh7f7RtyzTtdrbdqqsunu5Mm3wDvUAKRHSC34sJ7in334
main network
private key
secret exponent: 6911148411995144978760870745829922996940679624545579157337939690887778291444
public pair x:   105057282133830096634347636156404839657295714659718969275583812905733562047785
public pair y:   17712072790142815456499743834226465136387452166901773802686902612896636999328
tree depth:      4
fingerprint:     d880d7d8
parent f'print:  ee7ab90c
child index:     2
chain code:      cfb71883f01676f587d023cc53a35bc7f88f724b1f8c2892ac1275ac822a3edd
WIF:             KwjQsVuMjbCP2Zmr3VaFaStav7NvevwjvvkqrWd5Qmh1XVnCteBR
  uncompressed:  5Hw1ss3oPLXfyYSZrxQr4xFrpq7nEaX5HkSnxdAXuWcM4JEio8S
Bitcoin address: 1LjmJcdPnDHhNTUgrWyhLGnRDKxQjoxAgt
  uncompressed:  1FzKW1LPEjEeRamxYR8oxVPLFJt525Nffm

$ genwallet -f master-private-key -s 0p/1/2p/2/1000000000 -i
xprvA41z7zogVVwxVSgdKUHDy1SKmdb533PjDz7J6N6mV6uS3ze1ai8FHa8kmHScGpWmj4WggLyQjgPie1rFSruoUihUZREPSL39UNdE3BBDu76
main network
private key
secret exponent: 32162737660659799401901343156672072893797470137297259782459076395168682141640
public pair x:   19122724810578381401279259492091176497647579703487086604820598127878910996497
public pair y:   93716738155005567718020901196556981584525395439024483644561058920479008416610
tree depth:      5
fingerprint:     d69aa102
parent f'print:  d880d7d8
child index:     1000000000
chain code:      c783e67b921d2beb8f6b389cc646d7263b4145701dadd2161548a8b078e65e9e
WIF:             Kybw8izYevo5xMh1TK7aUr7jHFCxXS1zv8p3oqFz3o2zFbhRXHYs
  uncompressed:  5JMbvQZXHJAzJyoDnqWasGCwtiHJZivF2ckjn3n5mazYYtGNvJf
Bitcoin address: 1LZiqrop2HGR4qrH1ULZPyBpU6AUP49Uam
  uncompressed:  1N7NsvfJJqhjjFp5R2X9FmBc8MLU7gxbsL

Note how the second example suggests that the first example is its parent by identifying its fingerprint, and having a depth that’s one deeper.

We can feed the key on the command-line too using -k (although it’s a bad idea for real keys, since it exposes it in ps and your shell’s history). Every bit of this data is encoded in the 111-character wallet key.

$ genwallet -i -k xprvA41z7zogVVwxVSgdKUHDy1SKmdb533PjDz7J6N6mV6uS3ze1ai8FHa8kmHScGpWmj4WggLyQjgPie1rFSruoUihUZREPSL39UNdE3BBDu76
xprvA41z7zogVVwxVSgdKUHDy1SKmdb533PjDz7J6N6mV6uS3ze1ai8FHa8kmHScGpWmj4WggLyQjgPie1rFSruoUihUZREPSL39UNdE3BBDu76
main network
private key
secret exponent: 32162737660659799401901343156672072893797470137297259782459076395168682141640
public pair x:   19122724810578381401279259492091176497647579703487086604820598127878910996497
public pair y:   93716738155005567718020901196556981584525395439024483644561058920479008416610
tree depth:      5
fingerprint:     d69aa102
parent f'print:  d880d7d8
child index:     1000000000
chain code:      c783e67b921d2beb8f6b389cc646d7263b4145701dadd2161548a8b078e65e9e
WIF:             Kybw8izYevo5xMh1TK7aUr7jHFCxXS1zv8p3oqFz3o2zFbhRXHYs
  uncompressed:  5JMbvQZXHJAzJyoDnqWasGCwtiHJZivF2ckjn3n5mazYYtGNvJf
Bitcoin address: 1LZiqrop2HGR4qrH1ULZPyBpU6AUP49Uam
  uncompressed:  1N7NsvfJJqhjjFp5R2X9FmBc8MLU7gxbsL

You can traverse the tree partially, and still get descendents from the child node. Here we see the path from the master M through 0/1/0/1 yields the same results as we get from going from M to 0/1/0, stopping there for a moment, then going to 1.

$ genwallet -f master-private-key -s 0/1/0 > m0,1,0
$ cat m0,1,0 
xprv9xrdP7iD2L1YW43ygAskFNznwRQAFkb67X6vK5mJF1tdDQWXJdQnBjQqwYaHKPQ5wseEEDWmgBFpXmtxfGvAERhfiUZfoCyRfgMfGhPqx94
$ genwallet -f master-private-key -s 0/1/0/1
xprvA1fSfYjT9jfrP4WfUgMEcvTRYyQ6qoqRT2t7Z9qZ41TEG7egUv28pL5dHodAkrET8k7UqjzKRKCFUR1V36E9egU9sHoKXuiAFNbVsZdrnhj
$ genwallet -f m0,1,0 -s 1
xprvA1fSfYjT9jfrP4WfUgMEcvTRYyQ6qoqRT2t7Z9qZ41TEG7egUv28pL5dHodAkrET8k7UqjzKRKCFUR1V36E9egU9sHoKXuiAFNbVsZdrnhj

We’ve descended from master to 0/1/0/1 two ways. Same output.

You can strip out private key information, but still get the hierarchy of public keys.

$ genwallet -f master-private-key -s .pub > master-public-key
$ cat master-public-key
xpub661MyMwAqRbcFtXgS5sYJABqqG9YLmC4Q1Rdap9gSE8NqtwybGhePY2gZ29ESFjqJoCu1Rupje8YtGqsefD265TMg7usUDFdp6W1EGMcet8
$ genwallet -f master-public-key -s 0/2
xpub6AvUGrnEpfvJFYHymqh5qJ3V7qFyEFdpQom2tRQdV4Eo25kxagwHwVCMX1opKqAXxacHPAJafQW1uvH3bYQi1zbE5DMgXGAGNkHajLEuoa2
$ genwallet -f master-private-key -s 0/2.pub
xpub6AvUGrnEpfvJFYHymqh5qJ3V7qFyEFdpQom2tRQdV4Eo25kxagwHwVCMX1opKqAXxacHPAJafQW1uvH3bYQi1zbE5DMgXGAGNkHajLEuoa2

So public wallet keys can generate Bitcoin addresses. But getting WIF information requires the secret exponent, which has been stripped out.

$ genwallet -f master-private-key -s 0/2 -a
1J4LVanjHMu3JkXbVrahNuQCTGCRRgfWWx
$ genwallet -f master-public-key -s 0/2 -a
1J4LVanjHMu3JkXbVrahNuQCTGCRRgfWWx
$ genwallet -f master-private-key -s 0/2 -w
Kxtby4wzfHeCaRXma16dBNLgUE7Ct3Xkb6sRs3aZ56Bmtf1rcNWs
$ genwallet -f master-public-key -s 0/2 -w
can't generate WIF for public key

That means we can put a public wallet key on a web server, and even if a hacker steals it, all he (or she?) can do is generate the list of public keys. He can’t steal the Bitcoin since he has no access to the private keys. But keep those private wallet keys offline!!

We can generate uncompressed versions of Bitcoin addresses too, if you’re interested in that sort of anachronism.

$ genwallet -f master-private-key -s 0/2 -a -n
1HSEorKrq3DjqxtgETbLvnka62Wc4NZj3M
$ genwallet -f master-public-key -s 0/2 -a -n
1HSEorKrq3DjqxtgETbLvnka62Wc4NZj3M
$ genwallet -f master-private-key -s 0/2 -w -n
5JCEp99P8KGgKuEKFjARxpRrGgg1szDQUDCswqchLxYtrqZNgJh

Doesn’t it seem strange that the compressed WIF is longer than the uncompressed WIF? It’s true.

Private wallet keys have one additional power over public keys: only private wallet keys can generate children that use the “prime” directive. This derivation requires information about the secret exponent, which is stripped out of public keys. You can use this to generate change addresses, for example, which you probably want to keep slightly more private.

$ genwallet -f master-private-key -s 0p
xprv9uHRZZhk6KAJC1avXpDAp4MDc3sQKNxDiPvvkX8Br5ngLNv1TxvUxt4cV1rGL5hj6KCesnDYUhd7oWgT11eZG7XnxHrnYeSvkzY7d2bhkJ7
$ genwallet -f master-public-key -s 0p
can't derive a private key from a public key

You can strip it out later though, and you’re fine.

$ genwallet -f master-private-key -s 0p/5 > m0p,5
$ genwallet -f master-private-key -s 0p/5.pub > m0p,5.pub
$ genwallet -f m0p,5 -s 1 -a
1CxkGdM4oVdWdovcBHqUCeiUWUtN5EtR1a
$ genwallet -f m0p,5.pub -s 1 -a
1CxkGdM4oVdWdovcBHqUCeiUWUtN5EtR1a

In conclusion, this is pretty neat stuff.

How to be Minimally Redundant (or “A Splitting Headache”)

I’ve recently been investigating the command-line utility zfec, which is a lot like the UNIX “split” utility, except, using a technique called erasure coding, you can choose to split your file into M pieces, any K of which are enough to recreate the file using the complimentary [sic] zunfec command.

Concrete example: a 100K file can be split into 10 (or more) pieces, each just over 25K long, and zunfec can recreate the original from any 4 of them.

Any 4. You might expect this to incur a lot of overhead in each piece, but it turns out it’s just a header of four bytes or less. Pretty much the same as cutting the file into pieces.

This amazed me. How could this possible work? How can you split data into M pieces so that any K of them is enough to reconstruct them? Linear algebra to the rescue!

Suppose we want to encode a twelve character text string into a bunch of arrays, each with four values, any three of which are sufficient to reconstruct the original. Let’s use the string “Lovefromdawn”. Here’s what you do:

Figure 1

The matrix furthest to the left is a Vandermonde matrix, which is a matrix where each row forms a geometric series starting with 1. A square Vandermonde matrix has the special property that as long as the second column has no repeated elements, the matrix is invertible (and in fact, there is a special algorithm to invert it quickly).

The split pieces correspond to rows of the matrix on the right. Choose any three of them; let’s say row 1, 2 and 4. Once we know what row numbers they are (and this explains the tiny header), we get the following system of equations:

Figure 2

We know the leftmost matrix is invertible, so multiply both sides by the inverse, and we solve for X, recovering “Lovefromdawn”. Wowza!

But wait! One problem remains: if we use standard integer arithmetic, the matrix on the right ends up with a lot of values larger than 255, and we can’t store it in a byte array. That ain’t no good.

Luckily, it turns out that most theorems dealing with matrices and invertibility only assume we are operating over an arbitrary field, a mathematical structure that has addition, a 0, multiplication, a 1, and a reciprocal for every non-0 element.

Abstract algebra to the rescue! (Math seems to rescue computer science a lot.) It turns out that there is a field with 256 elements known as GF(2**8). In this field, 0 is the 0, 1 is the 1 (surprise!), addition is bitwise exclusive-or (so every number is its own additive inverse!), and multiplication is a weird, and seemingly unpredictable beast, that is best calculated by pre-populating a table of logarithms and exponents, then creating a 256×256 array containing the multiplication table (since each number fits in a byte, the whole table only takes 64K).

[There are actually lots of ways to define multiplication that still meets the requirements of a field, but we have to pick one by defining what’s called the irreducible polynomial for the field.]

Once we operate over this field, all matrix values are guaranteed to be in the range [0, 255], and our method for bytewise encoding and decoding becomes easy.

One more aside: zfec actually piles the 3×3 (well, KxK) identity matrix on top of the Vandermonde matrix, so the first K pieces are exactly what you’d get by cutting the file into K pieces in the obvious way and putting the tiny header in front. Even though our matrix is no longer a Vandermonde matrix, submatrices are still guaranteed to be invertible and it still works.

Good-bye App Engine

I’d been running my web site for comedians LaffQ.com in Google App Engine for nearly two years. Google App Engine seemed pretty neat when it first came out: it was the only free hosting service I knew of where you could deploy dynamic Python apps (using Django no less, a framework I was already familiar with) with the promise of Google managing the backups, scaling, and provisioning.

But as time went on, the sunny optimism began to fade. Although Google supported Django, it uses a proprietary BigTable-backed database which was not compatible with Django’s object-relational wrapper (ORM). The originally unbelievably high free limits during the beta period which reduced drastically, in some cases absurdly, when the product came out of beta. Visiting my internal operations page just twice could blow out nearly the entire day’s quota because it produced counts of many tables — so each item counted towards a daily quota of 50000 “small operations”.

Developing for Google App Engine was always a pain. It was non-standard enough that a lot of libraries and tools wouldn’t work or need annoying changes. There was a single point of documentation, written in a sort of corporate Google-ese — not horrible, but not the nicest documentation I ever read. There were a lot of layers of abstraction. It was proprietary. It was slow to fetch data. The pdb debugger didn’t work very well. Getting data out of production was an ordeal. It was even hard to launch a instance on the command line, which meant on those rare occasions when I decided to do development on my Linux netbook, I spent most of my time getting the newest version of Google App Engine to work again.

I don’t exactly remember how or why, but late one Sunday evening in November I suddenly came up with the bright idea to port the Google App Engine infrastructure to “sqlite” (as I referred to it in my head), ie. to use the standard Django database back-end with the idea to deploy it to some unknown host using sqlite3. I started almost immediately, figuring at the very worst, it would just be an abandoned experiment.

It turned out to be… well, if not “surprisingly easy”, remarkably painless. By Monday afternoon, I had gotten most of the core public functionality working just fine (insert show, edit show, delete show, list shows). The data model ported straight across with, one exception being a GeoPt latitude/longitude structure, which I simply reformatted into a string containing a comma-separated pair (which is was probably stored as in Google anyway).

By 9 pm on Monday, my entire test suite was passing. The internal pages hadn’t been ported yet, nor the “post on Facebook” functionality.

I decided to work on the non-critical internal pages first, or else I wouldn’t have any excuse to not deploy (and that’s scary!). I ported these components which gave the code base a chance to “set” (like Jelloâ„¢) and time for that unease associated with massive changes to dissipate somewhat.

Some changes that occurred more than once:

– adding “objects” everywhere, so Model.get becomes Model.objects.get
– changing Property to Field (ndb.StringProperty => models.CharField for example)
ndb.KeyProperty => models.ForeignKey
put => save
key.get. => key.
.query() => .objects
required=False => null=True
obj.key.delete() => obj.delete()
query.order => query.order_by
query.filter works very differently (no more fancy Google App Engine custom types based on operator overloading and deferred evaluation… fancy, but pretty opaque)

On Tuesday, I started working on the automated posts to Facebook. Not much needed to change here, but it was a bit nasty as there were only limited automated tests for this, so I had to be very cautious. Somewhere along the line, I decided I would deploy on my unlimited Dreamhost account (promo code: “RICHARD_SHARES”), which costs about $9/month and already hosts a bunch of domains. There were a few gotchas in getting the wsgi configuration working with Django (and it was tremendously difficult to debug until I hit on this idea of marshaling the requests to a file, then invoking the server by hand using the marshaled request), but this was reasonably straightforward, and I had built a local installation of Python 2.7 in the past, so I reused that.

I worked on the code to import the data. Google Data Export is another things that’s way too complex and slow, but I had done a trial run of the export on Sunday, so I used that as a testbed. I found some sample code to read the sqlite database (which is very simple) into Google Entity objects, so it was fairly simple to read properties out of the Google objects and put them into Django objects. I ended up doing it in two passes; the first pass included root objects that other objects have foreign keys to, which ensures that the second pass can refer to those already created objects without dangling foreign keys.

I waited until midnight, so the daily stats would be generated on Google, with Google data, then immediately put the site into read-only mode on Google and began the dump of data from Google. It was infuriatingly slow. It finally finished around 12:42 am, so about 40 minutes total, to produce a 21 MB sqlite database file. Finally, one little file with ALL my LaffQ data in it!

Here’s the script that converted the exported Google App Engine data to native Django objects in sqlite3: https://gist.github.com/richardkiss/4576523

I had already brought up the new site on Dreamhost, using two day old data, so it was just a matter of running the conversion tool, which read the Google sqlite database (which was essentially a BigTable dump, with one entity) and wrote out the Django sqlite database (which much more closely resembled the actual structure of data in my database). The new sqlite file was 8.7 MB. Compressed with bzip2, it was under 2 MB. That was the entirety of my web site data that took Google 40 minutes to export!!!

I copied it to Dreamhost in about five seconds, deployed it, restarted the Django app, and poked around a bit. Everything seemed to be in order and up to date, so I update DNS to point to the Dreamhost version of the web site, and waited for the change to propagate around the world (it’s funny to think about how LaffQ came up at different times for different people).

One thing I forgot is the Facebook integration required SSL. I didn’t have an SSL certificate (or a unique IP!) so I was flustered for a bit. I thought about how this used to work: it went to the laffq.appspot.com domain, the Google domain that supports SSL for free (signed with a Google appspot.com SSL certificate). Then I realized, I could just write a tiny Google App Engine app that proxied requests to https://laffq.appspot.com/ by fetching content from laffq.com. These pages are very low traffic (since they don’t really do anything), and it worked! (I had problems serving CSS to Chrome, which were resolved by making sure the content-type header was set correctly).

I’m kind of glad I didn’t remember this until it was deployed because I didn’t see the answer right away, and it might have made me not go through with it due to my tendency to know every move in advance.

All in all, the port went unbelievably well. The sqlite version is MUCH faster than Google even though memcache is no longer in the mix (I do use built-in Django caching in the same memory space as the Django app, although I’m pretty sure I could turn it off and it would still be very fast). It uses very little CPU on Dreamhost, and there are no annoyingly arbitrary quotas.

Since then, my work on LaffQ has accelerated beyond my wildest dreams. I didn’t realize just how much the Google App Engine restrictions were holding me back. Now I can back up my entire production database in just a few seconds. I can run a copy of production database locally, which gives me a much better feel for how changes will perform with production data, so I use production data in development all the time. This gives me a much better feeling about new features. Debugging is much easier. Deploying is as simple as a git push/git pull. I can look at logs. I can make tweaks in production. I can ssh to the production machine. I can diagnose problems in production. Pretty much everything about it is better. I’ve refactored, tweaked, optimized, added tests. It’s code I’m almost proud of now. (Almost.)

I saw on Hacker News the other day about how Google App Engine was down. I was happy it didn’t affect me.

Some Whining About Video Software

I’ve been working on editing video again lately. I’m struck by the same feelings I had last time I tried to do this way-too-complex and thankless task.

I appeared on the local cable access show “Paint with Lynn”, with my friend, comedian Lynn Ruth Miller. They gave me a copy of the shows on DVD. I want to edit each 30 minute episode down to something less than ten minutes so I could upload it to YouTube and so viewers wouldn’t have the urge to kill themselves out of boredom.

First, I thought I would use Final Cut Express. Of course, they don’t let you import the .VOB file format that DVDs use (which is just a renamed mpeg-2 file) unless you spend $20 for the Apple MPEG-2 Playback Component. Why Quicktime won’t play back one of the most important file formats out of the box is beyond me. Okay, it’s not really beyond me: it probably has to do with the cost of patents. It’s still frustrating though.

So I used Handbrake to convert to MPEG-4. This take an hour or so. Well, guess what… Final Cut Express would NOT import the MPEG-4 movies produced by Handbrake. Whaaat???

So after some tests, I used Handbrake to convert to AVI. Another hour. Why Apple’s Final Cut Express supports Microsoft’s container and not Apple’s is beyond the hell out of me.

After spending hours figuring out and getting used to Final Cut, I discovered that Handbrake transcoded the audio, and I suppose due to some weird bug, shifted the levels, causing a bunch of pops in the audio. I noticed this only AFTER I had spent hours assembling all the clips.

So I transcoded AGAIN, this time requesting Handbrake to not transcode the audio, but just it pass through. That’s generally better anyway: it takes less time, and it doesn’t involve conversion, which generally causes a loss in quality. Lesson #1: Avoid transcoding whenever possible.

Finally, I had something that Final Cut could read (an AVI file) and did not have audio popping. I had created the clips by settings marks on the large clip. Luckily, I was able to just switch the underlying AVI files, and since all the time offsets were the same in the new file, I didn’t have to recreate the clips. This is the rough equivalent of changing the tablecloth without unsetting and resetting the table. But it worked. Finally something was going right.

I decided that I hate Final Cut. It’s way overkill for someone who just wants to select sub-clips from a large clip. I thought I would try iMovie for the second episode to see if it was any better.

I was ecstatic to find that iMovie claims support of importing MPEG-2 without buying any additional Quicktime components. However, you can’t just import an MPEG-2 file off your disk: it will only import MPEG-2 from cameras. What?? So you have to trick it into thinking the MPEG-2 file is in a camera hierarchy. After poking around the internet a bit, I discovered an easy way is to image the DVD with Disk Utility, then when you mount the image, iMovie figures it’s a safe unprotected DVD created by a camera (never mind that the DVD I got was unprotected in the first place). It was a huge waste of time and temporary disk space, but the trick worked, and was faster and less lossy than transcoding.

But alas, there was still for one problem: there is something up with the first chapter of the episode I was trying to import, and iMovie would not import it. Of course, Apple’s DVD Player and my hardware-based DVD player both play it just fine, so it’s mostly correct. But iMovie is more picky than it needs to be, and it throws an error. Why doesn’t iMovie use the same MPEG-2 code as DVD Player.app? Who knows? I need some way to figure out what’s wrong with the MPEG-2 file and repair it. Offhand, I know of no such utility. Maybe VLC.app will do it. It does a fabulous job reading otherwise sketchy media files. But that will take at least 15 minutes, so I’ll skip that for now.

Then there is the little matter of rendering. Final Cut Express works very slowly with clips in AVI format (using the H.264 video codec), and for some reason just will not play “unrendered” clips. I believe that “rendering” the clips converts them into an internal DV format which, although incredibly huge, is very quick to decompress so it can scroll around in them. This rendering process takes unbelievably long: it seems to be 8x real time (so one minute of video takes eight minutes to render). And this is on a machine that has absolutely no trouble playing real-time. Why would it take so much longer to render than to play live?! And why does it have to be rendered at all?

I believe the most direct process would be to convert from MPEG-2 to DV format, which I assume would not have to be rendered except during transitions. I haven’t been able to find software that will do this though besides MPEG Streamclip, which requires the $20 MPEG-2 component.

Why doesn’t Apple do more code sharing between their various video codec projects (Final Cut, which won’t read MPEG-2 or .mov files; Quicktime, which requires an upgrade to read MPEG-2 files; iMovie, which can’t read some MPEG-2 files that DVD Player has no trouble with). I suspect that legal concerns with the DMCA enter into at least one of these redundant code disasters.

Video is a living hell. Yes, even on the Mac.

Django on Dreamhost

I went to the San Francisco Django Meetup last week and met some smart, nice people.

There was some talk about host to deploying Django. There were many good things said about Slicehost. Of course, to my mind, the cheapest and easiest way to deploy is Google App Engine. I say let the good employees at Google deal with the hard work of keeping these thousands of machine online and responsive. If your GAE site goes down at 4 AM, there’s no point in waking up to take a look at what’s going on, because you can’t even log in to those Google machines, much less have root access, so you may as well… keep sleeping. Which is what I prefer to do at 4 AM.

Anyway, I’ve used Dreamhost for years to host both my blog and my personal web site, which is actually a very simple Django app. It’s met my needs perfectly; it’s a very low traffic web site, and at less than $10 a month (including ssh access!) incredibly cheap.

You don’t get root like you do on slicehost.com or other VPS, but I consider that a feature, not a bug. Do I really want to be the one responsible for keeping system software safe from the hacker attack du jour?

If you feel like sending some kickbacks my way, enter my email address as your referrer (him at richardkiss.com) or sign up here.

I expect to release my Django Dreamhost configuration as a github project Real Soon Nowâ„¢.