, 5 min read

Pagefind: Searching in Static Sites

Original post is here eklausmeier.goip.de/blog/2023/10-23-pagefind-searching-in-static-sites.


Pagefind is a JavaScript library, which you add to your static site. By that you then have complete search-functionality. Pagefind has the following advantages over other JavaScript libraries:

  1. Easy to install, no JavaScript dependency hell.
  2. Easy to add the CSS and the two lines with <script> tag.
  3. Creating the index is easy and reasonable quick.

Pagefind was mainly written by Liam Bigelow from New Zealand and is promoted by CloudCannon. It is open source. It is written in Rust and JavaScript.

Language kLOC #files
Rust 36 63
JavaScript 2 20

1. One-time installation. Installing Pagefind is just downloading a single binary from GitHub: select the proper binary for Apple, Linux, or Windows. In my case I used pagefind-v1.0.3-x86_64-unknown-linux-musl.tar.gz for Arch Linux. Unpack with

tar zxf pagefind-v1.0.3-x86_64-unknown-linux-musl.tar.gz

Unpacking the 10 MB archive will create a 22 MB exectuable, which is statically linked and therefore has no dependencies. That's it.

2. Add CSS and JavaScript to template. Add below CSS and JavaScript reference to your template file outside of <body>:

<link href="/pagefind/pagefind-ui.css" rel="stylesheet">
<script src="/pagefind/pagefind-ui.js"></script>
<script>
    window.addEventListener('DOMContentLoaded', (event) => {
        new PagefindUI({ element: "#search", showSubResults: true });
    });
</script>

Then add the actual search dialog in your template inside <body>, in my case to top-layout.php:

<div id="search"></div>

3. Creating index files. This step must repeated whenever you have new content, or rename files. It does not need to be repeated whenever you regenerate your static HTML files. Altough if you want to play safe, you can do just that. Index creation is using the above mentioned executable pagefind. Running this command shows all the options:

$ pagefind -h
Implement search on any static website.

Usage: pagefind [OPTIONS]

Options:
  -s, --site <SITE>
          The location of your built static website
      --output-subdir <OUTPUT_SUBDIR>
          Where to output the search bundle, relative to the processed site
      --output-path <OUTPUT_PATH>
          Where to output the search bundle, relative to the working directory of the command
      --root-selector <ROOT_SELECTOR>
          The element Pagefind should treat as the root of the document. Usually you will want to use the data-pagefind-body attribute instead.
      --exclude-selectors <EXCLUDE_SELECTORS>
          Custom selectors that Pagefind should ignore when indexing. Usually you will want to use the data-pagefind-ignore attribute instead.
      --glob <GLOB>
          The file glob Pagefind uses to find HTML files. Defaults to "**/*.{html}"
      --force-language <FORCE_LANGUAGE>
          Ignore any detected languages and index the whole site as a single language. Expects an ISO 639-1 code.
      --serve
          Serve the source directory after creating the search index
  -v, --verbose
          Print verbose logging while indexing the site. Does not impact the web-facing search.
  -l, --logfile <LOGFILE>
          Path to a logfile to write to. Will replace the file on each run
  -k, --keep-index-url
          Keep "index.html" at the end of search result paths. Defaults to false, stripping "index.html".
  -h, --help
          Print help
  -V, --version
          Print version

This blog uses Simplified Saaze. In the case of Simplified Saaze I generate static files like this:

php saaze -mortb /tmp/build

This builds all static files in /tmp/build, which happens to be in a RAM disk on Arch Linux. Then change to this directory and issue

$ time pagefind -s . --exclude-selectors aside --exclude-selectors footer --force-language=en

Running Pagefind v1.0.3
Running from: "/tmp/build"
Source:       ""
Output:       "pagefind"

[Walking source directory]
Found 555 files matching **/*.{html}

[Parsing files]
Did not find a data-pagefind-body element on the site.
↳ Indexing all <body> elements on the site.

[Reading languages]
Discovered 1 language: en

[Building search indexes]
Total:
  Indexed 1 language
  Indexed 555 pages
  Indexed 33129 words
  Indexed 0 filters
  Indexed 0 sorts

Finished in 1.618 seconds
        real 1.65s
        user 1.49s
        sys 0
        swapped 0
        total space 0

The command

pagefind -s . --force-language=en

would habe been enough in many cases. In my special case I want to exclude content, which resides between <aside> and </aside>, and similarly between <footer> and </footer>.

The option --force-language=en is required in my case as I have English and German posts. Without this option pagefind would create two distinct indexes: You can then either only search in one language but not in the other. By forcing the language pagefind puts everything into a single index. See Multilingual search.

Indexing creates a directory called pagefind. Just copy this directory to your web-server during deployment. This directory looks something like this:

pagefind
├── fragment
│   ├── en_0933ef4.pf_fragment
│   ├── en_100be25.pf_fragment
│   ├── en_10b07a1.pf_fragment
│   ├── . . .
│   └── en_fef8cdb.pf_fragment
├── index
│   ├── en_22c87b9.pf_index
│   ├── en_26afa46.pf_index
│   ├── en_2a80efb.pf_index
│   ├── . . .
│   └── en_fde0a3b.pf_index
├── pagefind.en_d6828bd6ef.pf_meta
├── pagefind-entry.json
├── pagefind.js
├── pagefind-modular-ui.css
├── pagefind-modular-ui.js
├── pagefind-ui.css
├── pagefind-ui.js
├── wasm.en.pagefind
└── wasm.unknown.pagefind

3 directories, 596 files

These files in index are usually around 40KB each, those in fragment are usually around 1-10 KB each. The JavaScript totals 100KB, CSS is less than 20KB.

4. Network traffic. Pagefind was particularly designed to only load small amounts of data over the network. This can be seen from below diagram.

This makes Pagefind particularly attractive performancewise.

5. Using Pagefind as user. Using Pagefind as user is intuitive and needs no further explanation. This blog has Pagefind integrated into every page as of now. Just type a word you want to search, then results will pop-up almost instantly. This instant reaction is no surprise as the actual searching is done in the browser.

There is one slight limitation of Pagefind: currently you cannot search for word groups. I.e., consider Shakespeare's Hamlet:

To be, or not to be, that is the question

Searching for to or be would likely give you lots of results, but probably not the ones you are looking for. Clearly not a problem for this blog, as I do not have lyrics here.