A transparent, extensible static site generator
Z. D. Smith, Brooklyn, NY, 2020.
Bagatto operates in two phases: the data phase and the site phase.
In the data phase, Bagatto will read a data specification and use it to create a table containing all the input data. Data specifications that consist of literal attributes will result in site data containing those same attributes. Specifications of a single file path will result in site data pertaining to the one file. Specifications of a file wildcard will result in an array of data objects, one for each file that matches the wildcard.
In the site phase, Bagatto will read the site specification and use it to generate a sequence of files. Each site specification will ultimately specify the path and contents of one or more files. Bagatto will then ensure the presence of each file path and ensure that its contents are as specified.
We can enter a REPL environment that allows us to explore the index module by using the --repl
flag to bag
. This enters a Janet REPL with three helper functions injected: eval-data
, eval-site
, and write-site
. These represent the three main steps of executing Bagatto: generating site data, generating site “write specifications”, and writing the specified files.
Here’s a short example using the basic-site
demo:
code-src/bagatto/demo/basic-site [master !] ⊕ bag --repl index.janet
repl:1:> (eval-data data)
Reading config data spec...
Reading pages data spec...
Reading css data spec...
Beginning 3 jobs...
Loaded config
Loading pages...
[pages] Loading 2 files
Loading css (styles.css)...
Finished jobs.
@{:config {:author "Z. D. Smith"
:description "A minimal website"
:title "Bagatto Demo"}
:pages @[@{:basename "about"
:path "pages/about.md"
:contents @"..."}
@{:basename "bagatto"
:path "pages/bagatto.md"
:contents @"..."}]
:css @{:path "styles.css"
:contents @"..."}}
The bag
command accepts a single filename as an argument. This is known as the index module, and it should be syntactically correct Janet. One of the principles of Bagatto is to go as far as is practicable to make the operation of the Janet language inside the index module as similar as possible to any other use of the Janet interpreter or compiler.
Thus, there’s only one real difference between programming inside an index module and writing a normal Janet module: Bagatto inserts a couple useful libraries into the namespace so that we, as the site authors, don’t need to manage these libraries in order to use them inside the module.
They are:
bagatto
library itself, a collection of useful functions designed to reduce boilerplate in Bagatto modules, whose API is listed below;path
library, which exposes functions for manipulating file paths;janet-sh
library, which exposes a useful DSL for shelling out to the command line.One of Bagatto’s principles is to expose as much of its API as possible in the form of ordinary functions to be used to produce the data structures you define in your index module. There are a couple places where that isn’t possible, and where we have to expose a “global” API instead.
In addition to the helper functions exposed in the bagatto/
namespace, there are a few features that can be accessed directly inside of index modules:
Bagatto exposes the bagattto/set-defaults!
function, which can be called at any point inside an index module. It takes a single argument: a struct or dictionary specifying the default value for any of the specification attributes: :src
, :attrs
, :dest
, :out
.
By calling bagatto/set-output-dir!
, you can specify the directory that Bagatto should write its generated file tree into. This is principally exactly the same as appending that directory name to every path that you generate; however, if you use this feature then you can re-use paths in your business logic (for instance, you can define a path value and use it when generating a file, and when rendering a link in your site), as the additional file hierarchy will be transparently dealt with.
Your Bagatto module should expose a data specification like this:
(def data ... )
This value will be used as the starting point by the Bagatto application. Its job is to specify all the inputs that should go into the system.
The data
value should be a struct where the keys are data specification names and the values are the specifications themselves. The data specification names are meaningful, as they are referred to by the site specifications, as we’ll see.
The simplest form of a data specification is a literal struct or table, like this:
(def data {:config {:attrs {:title "A Demo Bagatto Config"}}})
When Bagatto creates the site data for this specification, it will consist of a single key-value pair:
repl:13:> (eval-data data)
@{:config {:title "A Demo Bagatto Config"}}
The next type of data specification is a reference to a single file in the project. These will consist of two attributes, :src
, which specifies the location of the file with respect to the current working directory, and :attrs
, which contains a function that will be called with the file contents, like this:
(def data {:config-json {:src "config.json"
:attrs bagatto/parse-json}}
(Theoretically, you could pass in a data literal as above in this case too, but in that case the file would be ignored and there wouldn’t be much point.)
In this case, Bagatto will look for a file called config.json
in the current directory, load its contents, and then call bagatto/parse-json
on them. The resulting attributes will then be the content of the site data associated with :config-json
.
repl:17:> (eval-data data)
@{:config-json @{"subtitle" "A Very Good Blog."
:path "config.json"
:contents @"{\"subtitle\":\"A Very Good Blog.\"}\n"}}
We see that the resulting site data has a single entry, :config-json
. The table associated with this entry has the two attributes we get for free—:path
and :contents
, which are the file path and contents, respectively—but that the call to parse-json
has resulted in the key/value pairs inside the JSON file have been parsed and put in the site data too.
The last way to specify data inputs is with wildcard references to multiple (potential) files. Under the hood, this relies on the glob
function of janet-sh. There are two wildcard methods: bagatto/*
and bagatto/slurp-*
.
Use bagatto/*
to provide a all the filenames that match the wildcard. bagatto/*
will return a new function, which will run the file wildcard when evaluated, so we can simply call the resulting value in the REPL to see it at work:
repl:24:> ((bagatto/* "demo/static/*"))
{:each @["demo/static/hello.png"]}
The output is an :each
struct, which lets Bagatto know that when used as a data source, it should iterate over the contents and create a new output for each one.
We can use it as a data specification:
(def data {:static {:src (bagatto/* "demo/static/*")
:attrs bagatto/parse-base}})
repl:27:> (eval-data data)
@{:static @[@{:path "demo/static/hello.png"}]}
Since we specified the parse-base
parser, and used the basic form bagatto/*
(which only lists files), we get an array of tables with the :path
attribute only.
This is the minimal case for listing files, but for files like the above, that only need to be copied into place, it’s all we need.
bagatto/slurp-*
has the same wildcard functionality, but it also includes the contents of the matching files. We can use this to process files in more interesting ways.
repl:28:> ((bagatto/slurp-* "demo/posts/*.md"))
{:each @[("demo/posts/post.md" @"{:title \"Post 1\"}\n%%%\n## A Post That You Might Be Interested In...") ...]}
Each output is a two-tuple of the file’s path and contents. In this example, the post markdown files are formatted with Mago, an extremely simple way to add frontmatter metadata to any text file.
We can define a data specification based off of this loader. In this case we’ll specify the Mago parser as the attrs
callback. That will be able to extract the Janet frontmatter as additional metadata.
(def data {:posts {:src (bagatto/slurp-* "demo/posts/*.md")
:attrs parse-mago}})
repl:33:> (eval-data data)
@{:posts @[@{:path "demo/posts/post.md"
:title "Post 1"
:body @"..."
:contents @"..."}
@{:path "..." ...}
...]}
Having evaluated the data specification, we can see that :posts
is an array with one element for each file that matched the wildcard. Unlike with the single-file example above, parse-mago
was then called for each post.
Since the wildcard loaders offer the ability to load multiple files, and the attrs
callback operates on each file individually, Bagatto exposes one more element of the data specification: the transform
callback. A transform, if specified, is called on the whole set of elements after each one has been parsed. This allows us to, for instance, sort a list of blog posts after they’ve been loaded and parsed.
(def data {:notes {:src (bagatto/slurp-* "notes/*.md")
:attrs parse-note
:transform (bagatto/attr-sorter "topic")}})
bagatto/attr-sorter
is exposed as a part of the Bagatto library and allows us to specify a key present in all the items, and sort the collection by it.
The second and last value that your index module should define is site
:
(def site ...)
This is the site specification, which defines all the outputs of the system. Every site specification entry specifies either a single file, or a sequence of files, to be generated. To specify a file we must output the path it should be created at and the contents of the created file.
The structure of the site specification is quite similar to the data specification: it’s an association between names and specification values. However, in this case the names don’t have any effect on the generated site; they’re just useful for the site author to organize their code.
The relationship between data
and site
is simple but important to understand. The site specification is evaluated in the context of the site data, which is the output of the data specification (it’s what we see when we run eval-data
above).
A site specification isn’t actually a mapping data entries to pages; in most websites of any size, any given page will require data from more than one input (for instance, to display a recent posts sidebar on every page), and may well create more than one page out of the same input. Thus it’s useful to understand the overall flow of data in the system: Bagatto uses the data specification to create the site data, and then iterates through the site specification using the site data as the context, or evaluation evaluation environment, as it evaluates each entry in the specification. Each entry results in a sequence of one or more files to be written.
As noted above, every site specification specifies the path and contents for one or more files to be created. Therefore, perhaps the simplest possible site is one consisting of a static path and contents:
repl:2:> (def site {:_ {:dest "out.txt" :out "Welcome to my website"}})
We can use eval-site
to get an output of the path and contents of each file to be created.
Site specifications are evaluated in the context of site data, but in this case our only specification is completely static. therefore we can pass in an empty struct as the site data.
repl:4:> (eval-site site {})
@[(:write "out.txt" "Welcome to my website")]
We can see that it plans to write a single file, with the specified path and contents.
More useful is to pass a function as the specification contents, rather than a static value. This allows us to dynamically act on the input data in useful ways.
A renderer function is simply any function which takes in the site data and outputs some file contents. We can write our own extremely simple one, which looks for a secret in the site data and outputs it in JDN format to a file. Then we can define a simple site specification with a static path that passes that function in directly as the :out
attribute.
repl:6:> (defn renderer [data] (string/format "%j" (data :secret)))
<function renderer>
repl:7:> (def site {:_ {:dest "out.txt" :out renderer}})
{:_ {:dest "out.txt" :out <function renderer>}}
Now, of course, we need to ensure that :secret
is present in the site data. While, in practice, we’d have a data entry that defined :secret
, it’s useful to note that for the purposes of inspecting our functions, we don’t need to use the output of the eval-data
command. We can construct a struct directly.
repl:8:> (eval-site site {:secret "p@ssw0rd"})
# ...Some output...
@[(:write "out.txt" "\"p@ssw0rd\"")]
The fact that the site data is a simple key-value structure, and the renderer output is just a string, makes it very simple to understand how data flows through the application and to extend it.
Perhaps a slightly more realistic example would be one that combines data from more than one source.
repl:12:> (defn renderer
[data]
(string/format "%s:%j:%f"
(get-in data [:config :prefix])
(get-in data [:personal :password])
(math/random)))
<function renderer>
repl:14:> (def site {:_ {:dest "out.txt" :out renderer}})
{:_ {:dest "out.txt" :out <function renderer>}}
repl:15:> (eval-site site {:personal {:password "p@ssw0rd"}
:config {:prefix "md5"}})
@[(:write "out.txt" "md5:\"p@ssw0rd\":0.487181")]
Here we see the very common case of combining data from multiple sources into a single file’s contents.
We saw that, using data specifications, we can select source files to be included in our site data with wildcards. A very common example of this would be building a blog; in addition to all the static content and any config files, we’d want to load up all the blog posts in a directory, and to be able to add a new post simple by adding a file to the directory, without changing the config.
Thus, it will be very common that in addition to rendering pages based on static config data or files, we’ll want to iterate through all the files that match a wildcard and render one output file for each (eg., rendering a post.html
for each source file).
For that we can use site selectors.
:each
Given some site data with a series of named data entries, we can use the each
attribute to refer to one of those entries. Bagatto will then call the dest
and out
functions on each file in the entry.
Because there’s now an additional piece of data in addition to the site data, the renderer and path generator functions in an each
specification take two arguments: the site data data
, and then the individual element item
.
We can see an example. First let’s define a data specification with a wildcard, so we have something to iterate over:
repl:55:> (def data {:users {:src (bagatto/slurp-* "users/*")
:attrs bagatto/parse-json}
:config {:attrs {:prefix "pw::"}}})
The users
directory will have two JSON files in it. Therefore, since we specify bagatto/parse-json
as the parser for users
, we can expect the users
site data to contain an array of 2 tables that have been decoded from the JSON.
Next we’ll define a renderer function. Like above, it will draw from multiple sources; but this time, it will take two arguments, because we intend it to be called on each element in users
.
repl:34:> (defn renderer
[data item]
(string/format "%s%s"
(get-in data [:config :prefix])
(item "password")))
As before we expect data
, but now we expect item
as well. For each call data
will be the same site data, and item will be a different element.
Finally we will define a site specification that uses :each
to refer to the users
site data.
repl:35:> (def site {:_ {:each :users
:dest (bagatto/path-copier "passwords/")
:out renderer}})
:each :users
will cause Bagatto to call the renderer once for each item in :users
. In addition, we now need to specify an actual function for :dest
. If we left it as a static value, the contents would be repeatedly written to the same file, which is obviously not what we want. Here we use the bagatto/path-copier
helper, which gives us a function that will accept any file and return a new path with the base we specify.
We can evaluate the data spec, and use that to evaluate the site spec:
repl:56:> (eval-site site (eval-data data))
@[(:write "passwords/alice.json" "pw::1234")
(:write "passwords/bob.json" "pw::snoopy")]
It’s produced two write plans, one for each user file, whose contents are interpolated from the contents of their respective source files.
A very common operation when generating a website is to copy a source file without touching it. If Bagatto receives a site specification with a site selector and a :dest
entry, but no :out
entry, it will interpret that as a copy operation. It will read the :path
of whatever item or item it receives (this attribute is always present), and copy it to the :dest
attribute of the site specification.
Here’s a super simple data spec:
repl:62:> (def data {:users {:src (bagatto/* "users/*") :attrs bagatto/parse-base}})
We use bagatto/*
instead of bagatto/slurp-*
, which just lists the files, but doesn’t read them. We also use bagatto/parse-base
as our parser, which just returns the base :path
attribute.
We can now define a site
that simply refers to :users
and specifies a path without specifying contents.
repl:58:> (def site {:_ {:each :users :dest (bagatto/path-copier "passwords/")}})
Evaluating the site produces two copy instructions to the new paths:
repl:63:> (eval-site site (eval-data data))
@[(:copy "users/alice.json" "passwords/alice.json")
(:copy "users/bob.json" "passwords/bob.json")]
Of course, most websites are not made by string/format
ing HTML together; they use HTML templates. The template system used by Bagatto is Temple. Temple is a wonderfully powerful and simple templating system that should be very enjoyable to use.
Here are the contents of a simple blog post template:
{:}
{% (bagatto/include "/templates/base_top") %}
<h1>{{ (get-in args [:config :title]) }}
<h2>{{ (get-in args [:_item :title]) }}</h2>
<p class="post-info">
{{ (get-in args [:_item :date]) }}
</p>
{- (bagatto/markdown->html (get-in args [:_item :contents])) -}
{% (bagatto/include "/templates/base_bottom") %}
The appeal of Temple is in its simplicity. It consists of four types of expression:
{$ ... $}
: Evaluate the expression between the $
s at compile time;{% ... %}
: Evaluate the expression between the %
s at runtime, escape and interpolate the output;{- ... -}
: Evaluate the expression between the -
s at runtime, interpolate the output without escaping it.{{ ... }}
: Evaluate and interpolate the expression inside the curly braces.While many other templating languages differentiate between capturing and non-capturing by differentiating between their escape brace types (which means having to change brace types from line to line, even within the same syntactic expression), Temple is non-capturing by default, and we interpolate into the surrounding template by printing to stdout. In other words, to interpolate something into a Temple template, simply use print
:
Welcome to my web page. Here's a pretty-printed example
of one of my favorite data structures:
{% (print (string/format "%q" {:name "Bowler Cat"
:species "Felis Domesticus"})) %}
Ain't she a beaut?
We can think of {{ foo }}
as syntactic sugar for {% (print foo) %}
.
Temple templates accept a single dictionary of arguments, which is bound inside the template to args
.
Bagatto adds a very thin layer of functionality and convenience on top of Temple. The first thing it does is it extends the Temple environment with the same libraries that are listed at the beginning of this manual. Thus we can call bagatto/
helper functions from within a template.
The only other change it makes is to ensure the presence, if applicable of the item
passed in as the second argument to site spec functions, which contains the attributes of the individual element of an :each
selection. Those attributes are made available at (args :_item)
. For instance, in the example above, we expect the attributes of the specific blog post being rendered to be present in the :_item
value, and so we refer to it to get the title, date and contents of the post.
The basic call to render a template is bagatto/render
. This allows us to directly invoke a template by name, with site data and an optional item, and returns the fully rendered template. For instance, if we have a simple template at templates/simple.temple
:
I am known for my {{ (args "topic") }} skills.
Then we can render out page contents like so:
repl:5:> (bagatto/render "/templates/simple" {"topic" "Web Design"})
@"I am known for my Web Design skills.\n"
In a proper web page, of course, our template file would contain HTML with placeholders for the values to be interpolated.
Because bagatto/render
is such a common operation, Bagatto offers a convenience function that will generate a renderer that will make the above call. For instance, if I wanted to specify the above template in a site specification, I’d probably write this:
repl:6:> (def site {:_ {:dest "out.txt"
:out (bagatto/renderer "/templates/simple")}})
Thus I avoid having to write a new renderer function for each :out
entry, if I’m just going to pass on the data to a specific template. Evaluating the site we get the same thing:
repl:7:> (eval-site site {"topic" "Web Design"})
@[(:write "out.txt" @"I am known for my Web Design skills.\n")]
A site spec with an :each
can include a :filter
attribute, too. This can be any predicate function which takes the site and an individual item from the spec’s site selector, and returns true or false. If the return value is false, the site spec will skip that elements.
This can be very useful when handling an input of mixed files. For instance, with a static/
directory that contains both CSS and supplementary HTML files, we might want to have different render steps for each. We could then write two site specs, that both take that data entry in their :each
, but have different :filter
attributes (we could also have written two different wildcards in two different data specs, but hopefully you get my point).
Bagatto bills itself as a “transparent” static site generator. By this we mean: we should favor first-class functions over configuration, and native terms and data structures over indirect control flow whenever possible.
Here’s a simple example: Bagatto creates files by combining a file path with some file contents. The values that can go in the :out
section of a site specification can either be strings, or functions which produce strings.
We might be tempted as application authors to introduce a layer of abstraction in front of the render process and ask the user to specify the name of a render function built into Bagatto. This would provide a simple, convenient DSL. Unfortunately, it has the very unfortunate side effect of effectively walling off that function from a site author. If—when—the author needs to understand what specifically is being passed into the render function, or needs to tweak its output slightly, they’re out of luck. The logic that reads this name, translates it into a render function, calls the function with some inputs and uses the output is all stuck within the belly of Bagatto and the author might need to recompile the whole application to get into it.
Similarly, if they want to introduce a new renderer—a new template language for instance—they can only do so by introducing the function directly into Bagatto, giving it a name, and then passing the name in a site specification.
Therefore we keep the operation of the renderer within inspection of the author. By specifying a literal function, we can easily wrap other functions and debug their output or change it. Similarly, we do attempt to offer an author the same level of convenience as the above DSL; but instead of offering them the ability to name a function that we control, we offer them the ability to call a function that outputs the renderer function itself, so that they still have access to its inputs and outputs.
Thus, we have a pretty straightforward way to write our own loaders, attributes, path-generator and renderer functions.
Each of the below entries will have a typespec describing the signature of the functions that can be implemented. This isn’t meaningful Janet, but hopefully gives a succinct picture of the types that will be meaningful.
(let [element (or 'source-path '(source-path file-contents))]
(defn loader
[]
(or '{:each (element ...)}
'{:some element)}))
The :src
attribute in a data spec can take a 0-arity function which, when called, returns one of two types of values:
{:each values}
values
is any indexable data structure, the elements of which are either two-tuples or single values. Two-tuples will be treated by the base attribute parser as [source-path file-contents]
. Single values will be treated as source-path
only.
{:some value}
value
is a single instance of the above value type: either a two-tuple or a single path value.
We could, for instance, write a custom loader function that accepted a URL, made a web request, and returned a (file-url file-contents) tuple.
(defn parser
[contents attrs]
attrs)
:attrs
can take any parser function. The purpose of a parser is to transform the individual outputs of a data loader into an attributes table. There are two attributes that are guaranteed to be present when the parser is called, :path
and :contents
. A parser function shouldn’t remove either of these attributes, but can use them to generate new ones. For instance, if contents
is unparsed Markdown with YAML frontmatter, then a parser function could extract metadata from the frontmatter and return an updated attributes table with those arbitrary metadata.
contents
and the :contents
attribute can be expected to be identical, and the former is provided as a convenience.
An example of a custom parser would be one that shelled out to Asciidoctor to extract attributes from an asciidoc document.
(defn each-parser
[data item]
path)
(defn some-parser
[data]
path)
In the site specification, :dest
can take any function which returns a file path string. If the spec has an :each
, the generator function should take the site data and the individual item as arguments, and return the destination path for the individual item.
Otherwise, it should take the site data as a single argument and return the destination path for its entry output.
(defn each-renderer
[data item]
file-contents)
(defn some-renderer
[data]
file-contents)
:out
takes any renderer function—these work along exactly the same lines as path generators. If the site spec has an :each
, the function should take two arguments, otherwise it should take one. The return value of the function will be written directly to the file path in its site spec.
Following from the parser example above, an example custom renderer could take an asciidoc document and shell out to Asciidoctor to render it into HTML.