I wanted to read the web preview of the Django Book’s second edition on my Kindle. Besides the fact that all image links are broken on that website and have apparently been so for some time, I prefer to have these things in the DRM-free MobiPocket / Kindle format. Of course I couldn’t find this anywhere, so I rolled my own based on the book’s SVN repository.

On this page you can download the MobiPocket version of the book and the HTML source files I generated to make it. You can also read on for the skinny on how you can do this yourself.

This procedure works best on a unix-like machine, as we’re going to use grep and sed along with some Python.

1. We start by doing a checkout of the reStructuredText sources of the book, moving the linked graphics into the same directory as the reStructuredText txt files and then creating a grepindex.txt file that will serve as the basis for our table of contents index.txt:
svn co http://djangobook.com/svn/branches/2.0/ 20svn
cd 20svn
find graphics/ -name *.png -exec mv {} . \;
grep -h "^Chapter [0-9]*:" *.txt > grepindex.txt
1. The grepindex.txt will now be converted to something more reStructuredText-like using this script, called grepindex2index.py:
# first do:
# grep -h "^Chapter [0-9]*:\|^Appendix [A-Z]:" *.txt > grepindex.txt
# then:
# python grepindex2index.py grepindex.txt > index.txt

import re
import sys

===================
The Django Book 2.0
===================

This ebook version was prepared by Charl Botha <http://charlbotha.com/> from
the SVN at http://djangobook.com/svn/branches/2.0/ on 2011-04-25, and is
hosted by <http://vxlabs.com/>.

"""

def main():
f1 = open(sys.argv[1])
pat = re.compile('(^Chapter\s*([0-9]*)|^Appendix\s*([A-Z])):\s*(.*)$') chapters = [] appendices = [] for l in f1: # Chapter 10: Advanced Models -> Chapter 10: Advanced Models <chapter10.html>_ mo = pat.match(l) if mo.groups()[1] is not None: chapters.append("* Chapter %s: %s <chapter%02d.html>_" % (mo.groups()[1],mo.groups()[3],int(mo.groups()[1]))) else: appendices.append("* Appendix %s: %s <appendix%s.html>_" % (mo.groups()[2],mo.groups()[3],mo.groups()[2])) print "\n".join(chapters) print "\n".join(appendices) if __name__ == "__main__": main() Save this to a script called grepindex2index.py, then invoke it with: python grepindex2index.py grepindex.txt > index.txt 1. We’ll then proceed to fix all chapter references with the following bit of sed: sed -i "s/\.\.\/$$chapter[0-9]*$$\//\1.html/g" chapter*txt (this will change all “../chapter??/” links to just “chapter??.html”) 1. Everything is now ready to be converted to HTML: for i in *.txt; do rst2html$i echo \$i | cut -f 1 -d ..html; done