I wanted to read the web preview of the Django Book’s second edition on my Kindle. Besides the fact that all image links are broken on that website and have apparently been so for some time, I prefer to have these things in the DRM-free MobiPocket / Kindle format. Of course I couldn’t find this anywhere, so I rolled my own based on the book’s SVN repository.

On this page you can download the MobiPocket version of the book and the HTML source files I generated to make it. You can also read on for the skinny on how you can do this yourself.

This procedure works best on a unix-like machine, as we’re going to use grep and sed along with some Python.

  1. We start by doing a checkout of the reStructuredText sources of the book, moving the linked graphics into the same directory as the reStructuredText txt files and then creating a grepindex.txt file that will serve as the basis for our table of contents index.txt:
svn co http://djangobook.com/svn/branches/2.0/ 20svn
cd 20svn
find graphics/ -name *.png -exec mv {} . \;
grep -h "^Chapter [0-9]*:" *.txt > grepindex.txt
  1. The grepindex.txt will now be converted to something more reStructuredText-like using this script, called grepindex2index.py:
# first do:
# grep -h "^Chapter [0-9]*:\|^Appendix [A-Z]:" *.txt > grepindex.txt
# then:
# python grepindex2index.py grepindex.txt > index.txt

import re
import sys

rst_header = """
===================
The Django Book 2.0
===================

Copyright 2006, 2007, 2008, 2009 Adrian Holovaty and Jacob Kaplan-Moss.
This work is licensed under the GNU Free Document License.

This ebook version was prepared by Charl Botha <http://charlbotha.com/> from
the SVN at http://djangobook.com/svn/branches/2.0/ on 2011-04-25, and is
hosted by <http://vxlabs.com/>.

"""

def main():
    f1 = open(sys.argv[1])
    print rst_header

    # this will match on "Chapter 10: the title" or "Appendix B: another title"
    # groups 0: chapter 10 or appendix A; 1: 10 (or None), 2: A (or None), 3: title
    pat = re.compile('(^Chapter\s*([0-9]*)|^Appendix\s*([A-Z])):\s*(.*)$')
    chapters = []
    appendices = []
    for l in f1:
        # Chapter 10: Advanced Models -> `Chapter 10: Advanced Models <chapter10.html>`_
        mo = pat.match(l)

        if mo.groups()[1] is not None:
            chapters.append("* `Chapter %s: %s <chapter%02d.html>`_" % (mo.groups()[1],mo.groups()[3],int(mo.groups()[1])))

        else:
            appendices.append("* `Appendix %s: %s <appendix%s.html>`_" % (mo.groups()[2],mo.groups()[3],mo.groups()[2]))

    print "\n".join(chapters)
    print "\n".join(appendices)

if __name__ == "__main__":
    main()

Save this to a script called grepindex2index.py, then invoke it with:

python grepindex2index.py grepindex.txt > index.txt
  1. We’ll then proceed to fix all chapter references with the following bit of sed:
sed -i "s/\.\.\/\(chapter[0-9]*\)\//\1.html/g" chapter*txt

(this will change all “../chapter??/” links to just “chapter??.html”)

  1. Everything is now ready to be converted to HTML:
for i in *.txt; do rst2html $i `echo $i | cut -f 1 -d .`.html; done
  1. After downloading this django desktop background as cover image, I dragged and dropped the top-level index.html file onto the free Calibre software to import it, and then used the edit metadata function to set the cover image. After this, I converted the imported files to MobiPocket remembering to add the word “appendix” to the chapter detection xpath expression in the “structured detection” section of the conversion dialogue.

That’s all there’s to it! Have a good read.