The Ten Thousand Foot View

Foxspace is made possible by spacedock, my Obsidian–vault–to–Zola–site compiler. Since neither Obsidian nor Zola alone fulfill all my needs, I instead built a system that processes Obsidian notes to a format Zola can work with and enriches them with data necessary for the nicer features of this site (like tooltips or the link graph) to work.

The pipeline is:

  1. I write my notes with Obsidian. They’re versioned and stored in a Git repository.
  2. spacedock, a Rust executable, consumes the Obsidian vault and spits out a Zola project. It also does some basic linting, like making sure the front matter parses correctly and warning me about unreachable notes or dead links.
  3. Zola builds a HTML website from the spacedock output.
  4. The website gets optimized, minified, and deployed by a GitHub Actions workflow.

Starting Out

Foxspace has existed, in various iterations, as a couple different Jekyll sites, then later a Zola blog. It has always been either a completely static personal website with just an “about” section and a list of projects, or a classic, chronological blog with tags and whatnot. Eventually, I realized that format just doesn’t work for me—a classic blog encourages publishing content only once it reaches a sufficient length and gets proofread and edited, and then never touching it again once it’s live. That’s not how I work at all.

My notes are living notes–I go back and forth editing and extending them as my thoughts about them develop. Until the current iteration of Foxspace, they’ve been scattered between Foxspace, Notion, my self-hosted Trilium instance, and, at work, various Google Docs and Confluence pages. I wanted my notes to be:

  • Collected in one place; this means that whatever I use to store and edit them has to support all the features I use in the notes scattered across many services, like LaTeX or code snippets with syntax highlighting for obscure or nonexistent languages.
  • Plain text files, so that their availability and usefulness is not dependent on the availability of a third-party app or website.
  • Easy to edit and publish from anywhere; ideally requiring nothing more than a git command. This also means Windows, unfortunately—for mobile editing, I can connect to my homeserver and view the notes hosted there, but on a desktop platform like Windows I’d like to be able to edit them locally.

This last point ruled out a lot of implementation languages and stacks; I was initially looking at Python because of how easy it makes it to extend and modify third-party code, but Python Packaging is still a headache, upgrading interpreter versions necessitates recreating a virtualenv, and all of pip, virtualenv, setuptools, and poetry run into issues and bugs from time to time. I wanted to leverage an existing static site generator to avoid duplicating work someone else has already done, but Jekyll was out of the question (since I’m not at all familiar with Ruby) and Hugo is written in Golang, which I feel should be a crime to unironically consider for any purpose at all. That left Rust and Zola.

Independently, I found Obsidian—the first note-taking app that I didn’t really have any problems with. It even has Vim keybindings! Despite being closed–source (ew!), Obsidian uses “normal” Markdown and straightforward, plain directories to manage notes, so I figured it’s not too much of a risk to use it.

The path was, therefore, clear: I would use Obsidian to author my notes, and Zola to compile them into an accessible website I could make available on the Internet.

Dealing with Obsidian

My first point of headaches was Obsidian’s slightly–scuffed version of “Markdown”. Zola uses pulldown_cmark for processing Markdown, which implements the CommonMark spec. Obsidian… doesn’t.

Obsidian–Flavored Markdown

Obsidian, as far as I can tell, extends GitHub–Flavored Markdown with:

  • Comments, which are delimited by %% and get removed from the source when rendering.
  • Inline LaTeX, delimited by $ and not including any line breaks, which gets rendered with KaTeX inline.
  • Display LaTeX, delimited by $$ and potentially spanning multiple lines, which gets rendered with KaTeX in display mode.
  • [[wikilinks]], which resolve funkily - if both /a/foo.md, /b/bar.md, and foo.md exists, [[foo]] will resolve to /foo.md, and [[bar]] will resolve to /b/bar.md
  • Callouts, which are indicated by opening a blockquote with a construct like ![note].
  • Mermaid graphs, rendered automatically from fenced code blocks using mermaid as the language.
  • Highlights, delimited by ==.

Unfortunately, there’s no official spec or parser for “Obsidian–Flavored Markdown” (OFM), so I had to implement my own.

It took me a full week of iteration and rewriting my OFM parser to get it to an usable state. Rust makes it extremely difficult to extend other people’s code, so rather than hack Zola or pulldown_cmark, I had to implement my own OFM–to–CommonMark transpiler. As of 2023-07-05, it still lacks support for callouts, but for now I can equally well just wrap those in a <div> with a special class.

I started working on the parser on 2023-05-18, but it wasn’t until 2023-06-02—over two weeks later!—that I finally ironed out all the bugs. In retrospect, building an OFM parser from scratch would’ve been easier than trying to preprocess OFM into CommonMark. Since the various Obsidian extensions don’t get processed when they’re inside code blocks, and interact in often–unpredictable ways with line breaks and indentation, processing them is much harder than just running a regex replace.

After many rewrites, I ended up with a system that marks ranges of the OFM source as Keep (for, mainly, inline code and code blocks) or Process (for all the text elements where it’s legal for Obsidian extensions to occur) and runs sequential regex searches that fail if they match inside a Keep range, but otherwise subdivide whatever ranges they encompass into further extension-specific operations.

At the end of the process, I get a list of source spans annotated with operations (Keep, Process, CompileWikilink, Highlight, etc. etc.)—since they’re independent, I can distribute processing each span between threads to speed up the compilation (Rayon is incredible!), and finally just concatenate the results.

Wikilinks get resolved by looking for the shortest matching path among all notes and rendered as a Zola shortcode; LaTeX gets compiled serverside using katex-rs; highlights just turn into HTML <mark> tags. Mermaid graphs are… a bit of a different story.

Mermaid

Mermaid… sucks. Its layout engine is unpredictable and difficult to work with, the SVGs it renders turn into a mess in Firefox’s reader mode, the library weighs 3MB even minified, and, most importantly, its implementation is the ugliest pile of web garbage I’ve seen in my entire life. mermaid-cli works by literally spinning up a headless Chrome instance. It’s disgusting.

Unfortunately, Mermaid is also quite a bit easier to edit than Graphviz and comes built into Obsidian; since I want to minimize the friction of notetaking, going with what’s already available is the preferred choice, even if the software is a disaster.

Compiling Mermaid is quite a bit different to the other OFM features; the entire rest of this site builds in around 60ms, but adding just one Mermaid diagram adds over a second to the runtime, and a requirement to run Google software to the editing process. It’s disgusting.

I tried to rip out the dependency on Chromium and replace it with Firefox, but without success. I guess I’ll have to fork Mermaid and make it run with Node, eventually.

For now, I just spawn a detached mermaid-cli process for every Mermaid diagram that processes its diagram in the background while spacedock goes on to handle the rest of the markup. This triggers a lot of unnecessary Zola reloads as the diagrams get saved, but I don’t see a way around this for now.

This gets me from OFM to Zola–Flavored Markdown; i.e. CommonMark–plus–Zola–shortcodes. The next step is translating Obsidian’s YAML front matter into TOML for Zola.

Front Matter

Front matter is easy; a lot of keys are the same between Obsidian and Zola (title, date, etc). spacedock parses the YAML front matter in notes, removes all the keys it doesn’t know about (which are assumed to be Obsidian–specific), adds its own keys (authors, a preview, and some extra metadata to be used by templates), serializes it as TOML, and prepends it to the compiled CommonMark body. The note is now ready.

Dealing with Zola

This is still far from ready for Zola to build the site, though.

Zola is great if you use it as a “vanilla”, pretty barebones static site generator. Otherwise, it becomes pretty difficult to work with, since it lacks any sort of plugin system. This means anything nonstandard we could want to achieve, like Markdown extensions or even adding classes to rendered HTML elements, has to be achieved through a combination of:

  • Preprocessing the site sources with a different tool (in my usecase, this is spacedock itself)
  • Postprocessing the built site with a different tool (e.g. to minify Javascript)
  • Shortcodes
  • Template macros

In particular, Zola notes are all rendered at the same time, and so they can’t really access the data attached to other notes. This means that if I want to fetch the metadata of another post—e.g. to display recent posts, backlinks, previews, etc—that metadata needs to be compiled before Zola runs and loaded in the template code.

spacedock scans the content of its processed notes and builds three indices:

  • an index of backlinks, mapping each note path to a list of paths for notes that link to it. This gets saved as a .json file in the site workdir.
  • an index of page previews, mapping each note path to an object with keys for note previews, among other things. This gets saved as a .json file in the site workdir.
  • the note index for use by the JavaScript. Every note is given a numeric ID mapped to its title and a list of outgoing links and backlinks, also represented by numeric IDs. This gets inlined into the source code for the link graph at the bottom of each page and used to build the graph itself.

With the backlink and page preview indices, I can access the necessary metadata about notes in Zola templates before they actually get rendered. I use this to fill out the “pages that link here” section in the same step in which wikilinks get resolved, rather than having to post-generate it and probably fill it out with JavaScript.

A lot of the logic is built by abusing macros. In particular, this idiom:

{% set list = [] %}
{% for elem in iterable %}
  {% if condition(elem) %}
    {% set_global list = list | concat(with=f(elem)) %}
  {% endif %}
{% endfor %}

effectively implements iterable.filter(condition).map(f); combined with the ability to have recursive macros and the markdown filter, I have everything I need to do almost arbitrary computation in my templates. If all you have is a hammer, every problem can be solved by applying blunt force.

Because templates can load arbitrary data from JSON and other files at compile–time, this also means I can, luckily, avoid having to generate templates with spacedock.

The Zola Sources

I have a site_src directory with a Zola project sans content that gets copied over to the build directory almost unchanged. After interpolating the graph data into the JS source and exporting the preprocessed notes, I end up with a complete Zola site ready to be built and deployed. spacedock serve wraps zola serve and watches for changes in the source directories (both site_src and my Obsidian vault), so that if I make an edit, it automatically kills zola, re-exports the site, and restarts zola serve to let me see the site live in my browser.

Rust is really fast, by the way, especially with Rayon. Setting the thread count too high slows it down a bit—on my 5900x leaving it at default (12 or 24, I think) roughly doubles the runtime compared to the sweet spot of 4 threads—but with the right setting spacedock preprocesses all of my notes in under 50 milliseconds, and I haven’t spent much time optimizing it. Zola is much more heavily tuned, and takes roughly four times as long to render the site—although I suspect most of that time is spent in templates.

Deployment

Now that I have a complete Zola site, all that’s left is to deploy it. For that, I use a simple GitHub Actions workflow, roughly equivalent to:

on:
  push:
    branches: ["main"]

name: Build and deploy GH Pages
jobs:
  build:
    name: Deploy
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Cache Rust 
        uses: Swatinem/rust-cache@v2
      - name: Cache zola
        id: cache-zola
        uses: actions/cache@v3
        env:
            cache-name: cache-zola
        with:
            path: /usr/local/bin/zola
            ...
      - if: ${{ steps.cache-zola.outputs.cache-hit != 'true' }}
        name: Install zola
        run: ...
      - name: Build site with spacedock
        run: cargo run --verbose --bin spacedock build
      - name: Minify JS, CSS, SVG
        run: ...
      - name: Deploy to gh-pages branch
        run: ...

With the caches in place, the updated site is live within a minute of me running a git push!

Next Steps

spacedock isn’t quite feature-complete yet. In particular, I’d still like to:

  • Parse all OFM extensions:
    • Strip comments
    • Compile display LaTeX
    • Compile inline LaTeX
    • Resolve wikilinks
    • Render highlights
    • Render callouts
  • Use an actual logging framework rather than println! and eprintln!
  • Move most hardcoded constants to the config file
  • Render mermaid graphs serverside
    • Try moving graph rendering to a background service
  • Rework path handling—at the moment, I use a bunch of newtypes to distinguish between paths in the vault directory, paths in the output directory, output URLs, and Obsidian wikilink “partial paths”. This usage isn’t consistent, and my implementation is terribly unergonomic.
  • Add search—Zola already generates a search index, I just haven’t gotten around to actually implementing it on the website.
  • Optimize PNGs and other assets
  • Find a way to render Mermaid diagrams without relying on mermaid.js