Martin Isaksson

### Tags

The process of writing a LaTeX document can be one full of manual steps, resulting in a patchwork document that is not exercisable nor complete. This makes it impossible to reproduce the document from code and data. In this post we will create a pipeline for compiling a LaTeX document that works both locally and using GitLab CI. This is part of a series to create the perfect open science git repository.

## Introduction

When writing a document in LaTeX I’d like to use git for version control even if am working alone on a project. This allows me to track my progress, have a backup, and make sure the document is completely reproducible from raw data. The principle of Reproducible Research (Buckheit & Donoho, 1995; Claerbout & Karrenbach, 1992; Association for Computing Machinery (ACM), n.d.) is to make data and computer code available for others to analyze and criticize.

A good open source repository is exercisable and complete (Monperrus, 2018; Association for Computing Machinery (ACM), n.d.). This means that it must be possible to fully reproduce the document, down to the last pixel, from running a single script in the repository.

In this post we will take a look at the practicalities of writing a reproducible document in LaTeX using a Gitlab CI pipeline to ensure that we pass these requirements.

Our GitLab CI pipeline consists of a two build stages and one test stage. Each stage run in separate Docker containers.

This post is part of a series and follows Publication ready figures. To see more on requirements on open source repositories see Reproducibility aspects of the Swedish COVID–19 estimate report.

### Our contributions

• We define three phases of document compilation that compile figures, compile the main document and test the compiled document against some set of known requirements.
• We contruct a local compilation pipeline based on latexmk, make and Docker (Merkel, 2014).
• We construct a Gitlab CI pipeline that automatically compile the document when we push new code to the remote repository.

## What we will need

In this post we will use git and target the Gitlab CI pipeline framework, and so you will need a repository on Gitlab.

I recommend adding a .gitignore based on the Gitlab TeX .gitignore template.

## The local build system

We will use latexmk to build our LaTeX document. There are other build systems for LaTeX such as rubber, latexrun which also can be used, but latexmk has the advantage as is robust and already installed in the Docker image we are using.

We will use GNU make to trigger the latexmk build locally, or in the GitLab runner. The entry points will be slightly different in these cases.

Here we assume that running make figures is a step that is very time consuming so we would like to avoid running that all the time.

The command that we run from the command line to compile the LaTeX document is make. This will first run a Docker container, mount the working directory as and run make pdf inside the container. Since the working directory is mounted, the pdf-file will remain after the Docker container has been shut down and removed.

The complete script Makefile can be seen in the GitLab repository.

### Generating figures

The figures resides in the subdirectory figures which contains a Makefile. We can compile the figures locally with make -C figures or in a Docker container with

Each figure is generated from raw data and plotted using a Python script. Each script generates a figure in TiKZ format with the same base name, but with extension “.tex”.

### Compiling the document

Our document can be compiled using latexmk inside a Docker container with make. This is the same as running

The document will be compiled inside the container using

The container we are using is based on a Docker image which has TeXLive 2020 installed on top of an Ubuntu base image.

### Running unit tests

The test cases, written in Ruby can either be run locally with

which is the same as make check or in a Docker container with

This is the same as running make check_docker. For a more in-depth guide to LaTeX document unit testing see How to beat publisher PDF checks with LaTeX document unit testing.

### LaTeX development environment

When writing a paper we would of course like to see the results of our changes in near real time, and not have to commit our changes to git in order to compile the document.

We can tweak the render make target a bit so that latexmk will be run with the -pvc flag (Wienke, 2018). This puts latexmk into preview and continuously update mode.

This means we can run this command once and just edit our document in our favorite text editor.

Development environment using Sublime Text 3.

## The GitLab CI pipeline

In GitLab we have a possibility to run a pipeline for each commit using GitLab CI/CD. For this project we have defined three stages: the first stage figures creates the plots in Python; the second build compiles the LaTeX document and the third test runs unit tests on the compiled PDF document.

Our GitLab CI pipeline consists of a figures stage, a build stage and a test stage.

The complete script .gitlab-ci.yml can be found in the GitLab repository.

### Compiling figures

Our first pipeline stage will compile figures according to Publication ready figures. For this we use the official python:3.8 Docker image. Any job artifacts created in this step will be carried over to the next stage.

The figures are placed in the figures subdirectory and are built using a Makefile.

The reason for separating this step into a separate stage is that we assume generating figures can take a very long time, for example if a Machine Learning model is trained in this step. In this way we can also keep it separate when running it locally, so that we don’t have to regenerate the figures everytime we want to compile the LaTeX document.

#### Speeding up the build with caching

The figures stage can take a very long time since we need to download and install packages every time the stage runs. To avoid this we can use the example from Cache dependencies in GitLab CI/CD so that the figure stage becomes

We are using a virtualenv (Gabor, 2020) to be able to cache the installed packages as well.

Care has to be taken with this - the cache can become to big for Gitlab to handle.

### Compiling the LaTeX document

The second stage in the pipeline will compile the actual LaTeX document. Here, we need to use a docker image that have LaTeX and all needed packages installed. The Docker image we use is martisak/texlive2020, which is using TeXLive 2020.

The job artifact of interest is of course the compiled pdf-document, but we include any untracked file so that any logfiles and other generated files will be included.

### Running unit tests

The final stage of the pipeline will run unit tests on the created pdf-file. This is useful to for example make sure the number of pages are as expected, to check that the fonts are embedded properly and that any metadata is set correctly. We will cover these tests in detail in a later post, for now it is enough to say that these tests are written in Ruby, so we will use an appropriate Docker image.

Now when we have gone through all of this, we would like to share our final document with others. I like using a Gitlab badge for this.

Since we named our document main.pdf and the compilation stage is named compile we can find our document at https://gitlab.com/martisak/latex-pipeline/-/jobs/artifacts/master/raw/main.pdf?job=compile.

Of course, we need a fancy image to go with it, and we can generate one using shields.io.

You can add this badge either by adding it to your README.md or in your Gitlab settings under General and Badges.

A common way of writing LaTeX documents together with others is to use Overleaf. Editing can be done by all authors in real time and the compilation of the document is very fast. However, the online version doesn’t allow us to run arbitrary code, or perform test cases on our document. Furthermore, the version control is hidden from us. Overleaf has a few ways of letting us share the work. In my work, some of the content is proprietary and can be sensitive until the document is reviewed. This means I am not able to use cloud solutions to write my documents. However, Overleaf provides a Docker image that can be deployed locally.

Many authors have looked into using Gitlab CI for building LaTeX documents, for example (Manik, 2019; Lühr, 2018; Khan, 2018; Ergus, 2016). (Ajayakumar, 2020) wrote a very nice and complete guide, and used Gitlab Pages to deploy the compiled document.

In this post we extend this work and make a complete pipeline that also be run locally. Our pipeline consists of three stages, figures, build and test each responsible for a separate part of the build process.

## Conclusions

We have constructed a simple pipeline for compiling LaTeX documents in a Docker container. This fulfills the requirements that our repository shall be complete and exercisable (Monperrus, 2018; Association for Computing Machinery (ACM), n.d.).

To quickly get started, you can fork my repository on Gitlab or use the cookiecutter template provided here.

In upcoming posts we will further look into defining test cases for documents, complicating the build with Pandoc and other tricks to annoy your co-authors.

## References

1. Buckheit, J. B., & Donoho, D. L. (1995). Wavelab and reproducible research. In Wavelets and statistics (pp. 55–81). Springer.
2. Claerbout, J. F., & Karrenbach, M. (1992). Electronic documents give reproducible research a new meaning. In SEG Technical Program Expanded Abstracts 1992 (pp. 601–604). Society of Exploration Geophysicists. https://doi.org/10.1190/1.1822162
4. Monperrus, M. (2018). How to make a good open-science repository? https://researchdata.springernature.com/users/336958-martin-monperrus/posts/57389-how-to-make-a-good-open-science-repository
5. Merkel, D. (2014). Docker: Lightweight Linux Containers for Consistent Development and Deployment. Linux J., 2014(239). http://dl.acm.org/citation.cfm?id=2600239.2600241
6. Wienke, J. (2018). LaTeX Best Practices: Lessons Learned from Writing a PhD Thesis. https://www.semipol.de/2018/06/12/latex-best-practices.html
7. Gabor, B. (2020). virtualenv. https://virtualenv.pypa.io/
8. Manik, D. (2019). GitLab pipelines for every need: testing, documentation, and writing a paper. In deRSE 2019 - Konferenz für ForschungssoftwareentwicklerInnen in Deutschland. https://doi.org/10.5446/42490
9. Lühr, L. (2018). Automate Awesome CV with XeLaTeX and GitLab CI. https://ayeks.de/post/2018-01-25-awesome-cv-cicd/
10. Khan, S. (2018). Setting up GitLab to automatically generate PDFs from committed LaTeX files. https://sayantangkhan.github.io/latex-gitlab-ci.html
11. Ergus, A. (2016). Using GitLab CI for Building LaTeX. https://github.com/aufenthaltsraum/stuff/wiki/Using-GitLab-CI-for-Building-LaTeX
12. Ajayakumar, V. (2020). Continuous Integration of LaTeX projects with GitLab Pages. https://www.vipinajayakumar.com/continuous-integration-of-latex-projects-with-gitlab-pages.html

## Suggested citation

If you would like to cite this work, here is a suggested citation in BibTeX format.