halting problem : building a website with Pelican

creating complex, structured websites is a pain, that's why somebody invented a language to help out doing that; except that the language is PHP and it's full of bees and snakes, and if you're extremely lucky, it just helps some kid in Russia to take over your server to mine bitcoin or scam you out of your credit card details.

once upon a time, I wrote a static website generator.

it was my first sizeable Perl application, and my second one using the GTK+ 2.x Perl bindings. yes, my website generator had a GUI with an embedded text editor and a list of all articles. I was 20-something, naïve, and had access to CPAN — a combination that can only lead to the utterance of the sentence: “this should have just not happened”, quickly followed by: “I hope it didn’t suffer too much, for too long”, usually spoken in the general direction of a server somewhere.

at the time, I only had some space on a web server provided by my ISP, which meant no fancy CGI stuff, or database, or dynamic web pages. WordPress was still known as “b2”, and it had pretty much the same feature-to-security ratio as it has today. I had the option of using one of the budding blogging platforms of the time, but they were pretty much consistently terrible, and in the end I did not need anything special; so, I did what any self-respecting procrastinator with a penchant for software engineering can do when faced with a goal and no deadlines: I wrote my own script, then I wrote a module to abstract the script I wrote, then I wrote a framework that allowed me to write modules to abstract scripts. I was almost at the “re-engineer my DNA to allow me to create better copies of myself” step of the process when I decided I actually needed a comment form — see above, re: naïve 20-something — which meant getting a some actual web space somewhere, and maybe running a CMS platform. clearly, there was something wrong with me.

fast forward more than 10 years to February 2013, and me waking up in Sydney with the hosting provider for my private server sending me an incident report email saying that the website I am hosting on their infrastructure is actually serving credit card scam pages instead of my blog — which I assume they noticed because of the increase in traffic. after cleaning up everything, I soon after got hacked by a russian script kiddie who figured it could use my server hosted on a severely restricted VM to mine bitcoin. not the smartest kid on the Internet, I grant you that.

now: this could be easily attributed to the fact that I barely have time to do a proper sysadmin job, and you wouldn’t be that off; fact is, I do have a day job, and you may be shocked to know that it’s not administering a web server. to be precise, my daily mansions do not include managing two WordPress installations, a MySQL instance, and an Apache web server, as well as the operating system that runs them; they also do not include maintaining those installations secured and keeping up with a ton of CVEs. on the other hand, this whole mess can also be attributed to the fact that a platform for content on the web should probably not have the capacity of allowing third parties to control your server unless you keep updating it every month to counterbalance zero-day exploits.

since nobody is paying me to actually handle this stuff, and since I’m doing this in the copious (haha, right) amounts of time I have left in my life, I can either blame my tools, while still using them; or do the right thing, and change tools — possibly with other tools that do not require full time maintainership. third option: I could both blame the old tools and change them. if you know me, and you’re reading this, then you also know which one I did end up picking.

my first instinct was to just write a couple of scripts, generate my pages from them, and back everything up with a Git repository; I actually spent a bunch of time looking at existing stuff to build those scripts, and came pretty close to committing to that plan. when I realized I was actually starting a markdown parser library in C, I backed the fuck away from my keyboard, opened a beer, and watched about four episodes of Nichijō back to back — this one, in particular fits fairly well — in order to realize what a spectacularly bad idea I had. I went back to the drawing board, and drafted up the requirements for this adventure.

I was pretty much dead set on a static website, so I fired up the browser, looked for “list of open source static web site generators” and clicked on the first link I got. I didn’t want to install Ruby on my local machine, and at the first mention of “hand-crafted” I had to reach for my nerf gun. while I don’t have a philosophical objection to node.js, I also wanted to avoid downloading the entire GitHub mirror of node.js modules. I think I laughed hard enough to pass out, because I don’t have any memory of looking at the Haskell website generators. I don’t particularly like Python, but since there were only two options in Perl, in the end I decided to dust off my parseltongue and start counting the whitespace.

another requirement I set upon myself was being able to write posts in markdown, because as much as markdown is a underdefined, piss poor idea of a representational format, it’s pretty much how I have been writing articles since the late ‘90s, when I joined on Usenet. it also allows me to just fire up ViM, write down some stuff, and not think about style until it’s much too late for anybody to do something about it.

I thus settled for Pelican.

installation

first of all, I ignored the fact that my distribution packages Pelican, and I cordoned off the whole thing into its own prefix using virtualenv instead; if I had to nuke the site from orbit, I just wanted to be sure not to blow up the rest of my system as well.

$ virtualenv ~/Source/pelican
New python executable in /home/ebassi/Source/env/pelican/bin/python
Installing Setuptools...done.
Installing Pip...done.
$ source ~/Source/pelican/bin/activate
(pelican) $ # ← this is inside the virtualenv

I installed Pelican using pip:

(pelican) $ pip install pelican

then I used the pelican-quickstart script, which asked me a bunch of questions and built a skeleton of a website, including a helpful Makefile with a bunch of default targets; a configuration file for local testing; and a configuration file for publishing remotely.

since Pelican comes only with reStructured text and HTML by support by default, I had to install the markdown module in the same environment:

(pelican) $ pip install markdown

and I (mostly) was ready to go.

sure, I had to fix a couple of things in the configuration, and I had to tweak the theme a bit to turn it into something that was not a throwback to the simpler times of Geocities, circa 1997; nevertheless, everything I needed was pretty much all there out of the box, and it allowed me to hit the ground running at a moderate speed.

configuration

the default Pelican configuration is fairly sensible, so I only had to tweak things like the default URL for articles, and the pagination settings. I had to set a couple of defaults, to avoid adding too many metadata in the article documents themselves:

pelicanconf.py[Lines 5-42]download

# name of the website
SITENAME = u'halting problem'

# base URL
SITEURL = 'http://www.bassi.io'

# all times are relative to London
TIMEZONE = 'Europe/London'

# me
AUTHOR = 'ebassi'

# where the content is
PATH = 'content'

# default language
DEFAULT_LANG = u'en'
# use the file's mtime as the default date
DEFAULT_DATE = 'fs'
# a bland category name; content found under a sub-directory of PATH
# will use the directory name as the category, by default
DEFAULT_CATEGORY = 'misc'
# Month day, Year
DEFAULT_DATE_FORMAT = '%B %d, %Y'
# show 4 posts per page
DEFAULT_PAGINATION = 4

# my own little theme
THEME = 'theme/hlt'

# files under static paths are going to be copies as they are
STATIC_PATHS = [
    'images',
    'code',
]

# keep this True for local testing
RELATIVE_URLS = True

as I said, pelican-quickstart creates two separate configuration files:

the main one, pelicanconf.py, is used for local testing
the secondary one, publishconf.py, includes the first one and is used for publishing only

this means that the pelicanconf.py can set up most of the state you need, and publishconf.py can perform the expensive operations needed only when pushing the generated pages to the remote server. for instance, I disable the generation of all Atom feeds when testing the site locally:

pelicanconf.py[Lines 54-62]download

# disable all feeds
AUTHOR_FEED_ATOM = None
AUTHOR_FEED_RSS = None
CATEGORY_FEED_ATOM = None
CATEGORY_FEED_RSS = None
TRANSLATION_FEED_ATOM = None
TRANSLATION_FEED_RSS = None
TAG_FEED_ATOM = None
TAG_FEED_RSS = None

and enable the desired feeds when publishing:

publishconf.py[Lines 18-20]download

# enable full and per-category feed
FEED_ALL_ATOM = 'feeds/all.atom.xml'
CATEGORY_FEED_ATOM = 'feeds/%s.atom.xml'

this also means that pelicanconf.py can contain the RELATIVE_URLS variable and set it to True, whereas the same variable can be set to False in publishconf.py.

finally, I installed the typogrify module, to get a set of filters to clean up the typography of the text before publishing it:

(pelican) $ pip install typogrify
...

and enabled it in the configuration:

TYPOGRIFY = True

I did a couple more tweaks for the on disk layout of the website, which gets reflected into the URLs:

pelicanconf.py[Lines 76-101]download

# location of the per-section indices
AUTHORS_SAVE_AS = 'authors/index.html'
CATEGORIES_SAVE_AS = 'categories/index.html'
TAGS_SAVE_AS = 'tags/index.html'

# articles have the date in their URL
ARTICLE_URL = 'articles/{date:%Y}/{date:%m}/{date:%d}/{slug}/'
ARTICLE_SAVE_AS = 'articles/{date:%Y}/{date:%m}/{date:%d}/{slug}/index.html'

# whereas pages do not
PAGE_URL = 'pages/{slug}/'
PAGE_SAVE_AS = 'pages/{slug}/index.html'

# archives are supersets of the articles
YEAR_ARCHIVE_SAVE_AS = 'articles/{date:%Y}/index.html'
MONTH_ARCHIVE_SAVE_AS = 'articles/{date:%Y}/{date:%m}/index.html'

# sub-pages for each section
AUTHOR_URL = 'author/{slug}/'
AUTHOR_SAVE_AS = 'author/{slug}/index.html'

CATEGORY_URL = 'category/{slug}'
CATEGORY_SAVE_AS = 'category/{slug}/index.html'

TAG_URL = 'tag/{slug}'
TAG_SAVE_AS = 'tag/{slug}/index.html'

that’s pretty much it.

theming

the generated website uses the default simple theme, but it was really too simple for me, so I decided to take one of the many Pelican themes off of GitHub, and fork it to suit my own lack of taste. the template format is Jinja, and it’s not really hard to build a decent set of pages out of that.

in the process, I ended up getting up to speed with modern HTML and CSS — the last time I wrote some web page, HTML 4 was all the rage, and we had this new thing called XHTML 1.0, so you can guess I had some catching up to do. I guess having spent almost a year learning how the CSS machinery in a modern web browser actually works helped me a bit, in this regard.

Pelican requires templates for specific pages, but what goes inside them is pretty much left to you. the main entry point is the base.html template, which gets included by the index.html template — i.e. your landing page — and by the article.html and page.html templates. template pages go under the templates directory in your theme main directory; the other directory in the theme, static, is meant for assets, scripts, and CSS.

you could start from scratch, but my suggestion is to pick a theme from the ones available, and just fork it.

testing and deploying

once you set up some content and a theme, you may want to test it. just using:

(pelican) $ pelican -r content/ -s pelicanconf.py -o output/

will take all the content inside the content directory, the configuration settings in pelicanconf.py, and output the generated site under the output directory. you may point a web browser there and be done with it, but the Makefile generated by pelican-quickstart has a helpful devserver target:

(pelican) $ make devserver

which will generate the output, and start a local web server reachable at http://localhost:8000. the web server will regenerate the pages if you modify the content, the theme, or the configuration.

once you’re happy with the results of a change, you can stop the server using:

(pelican) $ ./develop_server.sh stop

if you decide to publish your site, you can use one of the publishing targets inside the Makefile, and fill out the settings variables at the top; there are targets for FTP, SFTP, rsync, rackspace, Dropbox, and GitHub pages.

conclusion

I guess I could recommend Pelican to anybody who wanted to set up a static website or a blog, and does not need all the bells and whistles of a dynamic content management system. the documentation is fairly well done, even if you need to hunt down the template syntax to the Jinja website. I don’t expect the latter to be a problem if you stick to one of the standard themes; if you want to modify one of those to suit your needs, tho, you’ll become fairly familiar with the Jinha syntax pretty soon. Pelican also comes with an assorted variety of plugins. the only one I use is the related_posts one, but there are various plugins that make interacting with other services, like flickr or YouTube, a bit easier.

more information

essays pelican blagging things I do so you don't have to web