Create publication ready tables with Pandas

Tables in scientific papers often look less than professional, and sometimes this can even get in the way of understanding the message. In this blog post we will use pandas to automate making publication ready LaTeX tables that look great.

Table of Contents

  1. Introduction
  2. Using pandas to make a table
    1. Automation
  3. Related Work
  4. Conclusion

Introduction

Tufte argues that we should strive to have a high data to ink-ratio (Tufte, 1986) which means that we should strive to remove redundant graphical element that do not contribute to conveying our message. This applies to tables as well.

For typesetting tables in my scientific papers I use LaTeX with the booktabs (Fear, 2020) package. Using booktabs goes a long way towards making beautiful tables with a high data to ink-ratio, but it’s a manual process.

Example figure produced with this method.
Example figure produced with this method.

In this blog post we will explore using pandas (pandas development team, 2020; Wes McKinney, 2010 ) and booktabs for removing some unwanted ink from our tables and building a pipeline for generating and including the tables into our LaTeX papers.

Using pandas to make a table

The first thing we need to do is to make a table from a dataset. We’ll look at the Iris dataset (Fisher, 1936) from the seaborn (Waskom, 2021) Python library.

We could simply use the pandas function to_latex() to save a file containing the table in LaTeX format. pandas requires booktabs, but we can make this table even better with some simple tweaks.

Example table using the Iris dataset from the `seaborn` library.
Example table using the Iris dataset from the `seaborn` library.

First we want to specify the table column format and round the numbers to two decimals. Secondly, we want to highlight the maximum number in a column by making the numbers bold. And lastly, we want to make each column header bold.

Specifying the table format is easy using the siunitx package (Wright, 2009). We set each of the number columns to S[table-format = 2.2].

Making the maximum value in each column bold requires a bit more work. (Kalinke, 2020) wrote an inspiring post that we make use of here. Since we are using siunitx to specify the column format we use \bfseries to make numbers bold and allow siunitx to detect this by loading the package with \usepackage[round-mode=places,detect-weight=true,detect-inline-weight=math]{siunitx}.

Column header titles should be bold and in title case, so we directly modify df.columns to achieve this.

Since we added LaTeX tags to our table we must set escape to False in the to_latex call.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
import seaborn as sns
import os


def bold_extreme_values(data, data_max=-1):

    if data == data_max:
        return "\\bfseries %s" % data

    return data


if __name__ == "__main__":

    # Load data and
    # calculate mean of each column
    df = (sns.load_dataset('iris')
          .groupby("species")
          .mean()
          .reset_index()
          )

    # Specify in which columns to make the maximum bold
    col_show_max = ["sepal_length", "sepal_width",
                    "petal_length", "petal_width"]

    # Iterate through columns
    for k in col_show_max:
        df[k] = df[k].apply(
            lambda data: bold_extreme_values(data, data_max=df[k].max()))

    # Set column header to bold title case
    df.columns = (df.columns.to_series()
                  .apply(lambda r: "\\textbf}".format(
                      r.replace("_", " ").title())))

    # Write to file
    with open(
        os.path.splitext(
            os.path.basename(__file__))[0] + ".tbl", "w") as f:

        format = "l" + \
            "@{\hskip 12pt}" +\
            4*"S[table-format = 2.2]"

        f.write(df.head()
                .to_latex(index=False,
                          escape=False,
                          column_format=format)
                )

At the end we are using the pandas function to_latex() to generate the LaTeX code and write the result to a file containing the tabular environment. For this example, we have used seaborn==0.11.1 and pandas==1.2.3.

Now we are ready to include the generated file into a LaTeX document.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
\documentclass[tikz,crop,convert={density=400,outext=.png}]{standalone}

\usepackage{booktabs}
\usepackage{etoolbox}
\usepackage[round-mode=places,detect-weight=true,detect-inline-weight=math]{siunitx}
\renewcommand\arraystretch{1.2}

\listfiles

\begin{document}
\begin{table}
\robustify\bfseries
\caption{A generated table}
\input{table.tbl}
\end{table}
\end{document}

We can compile this as a standalone figure into PNG and PDF by running pdflatex -shell-escape figure.tex.

Automation

To automate this build, we can use the following Makefile. The input files are table.py and table.tex, both of which are listed above.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
SOURCES=$(wildcard *.py)
PNG_OBJECTS=$(SOURCES:.py=.png)
PNG_I_OBJECTS=$(SOURCES:.py=-0.png)
PYTHON=pipenv run python3
LATEX=pdflatex

all: $(PNG_OBJECTS)

%.tbl: %.py
	$(PYTHON) $<

%.png: %.tex %.tbl
	$(LATEX) -shell-escape $<
	convert $(basename $@)-0.png -flatten -trim +repage $(basename $@).png

clean:
	-rm $(PNG_I_OBJECTS) $(PNG_OBJECTS)
	-latexmk -C $(basename $(SOURCES))

.INTERMEDIATE: $(PNG_I_OBJECTS)

We want to create the file table.png. To do this we start with running Python to generate the .tbl file that we then include in table.tex. Compiling table.tex renders the table and saves it as a .png. We get a .pdf for free when using pdflatex.

(Kalinke, 2020) was the inspiration for this post. The method used herein to make numbers bold included code for formatting the numbers. In this work we use siunitx instead to do the formatting.

In R we can use packages xtable or kableExtra to achieve similar results. In particular, kableExtra is very capable and the documentation (Zhu, 2020) has many interesting examples.

The entire library of work by Edward Tufte is hugely inspirational to us. (Tufte, 1986) tells us not to put too much ink on the paper.

Conclusion

We have looked at how to make tables generated by pandas to look more professional by using siunitx and some tweaks. The Makefile we created should go into the tables directory of your manuscript so that you can use make -C tables all as a dependency to your normal make report target.

Easily digested tables makes it easier to understand the message we are trying to convey. In fact there is some evidence (Huang, 2018) that the visual appearance of a paper is important and that improving the paper gestalt reduces risk of getting a paper rejected.

References

  1. Tufte, E. R. (1986). The Visual Display of Quantitative Information. Graphics Press. https://www.edwardtufte.com/tufte/books_vdqi
  2. Fear, S. (2020). Publication quality tables in LaTeX. 1–18. https://ctan.org/pkg/booktabs
  3. pandas development team, T. (2020). pandas-dev/pandas: Pandas (latest). Zenodo. https://doi.org/10.5281/zenodo.3509134
  4. Wes McKinney. ( 2010 ). Data Structures for Statistical Computing in Python . In Stéfan van der Walt & Jarrod Millman (Eds.), Proceedings of the 9th Python in Science Conference (pp. 56–61 ). https://doi.org/ 10.25080/Majora-92bf1922-00a
  5. Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2), 179–188.
  6. Waskom, M. L. (2021). seaborn: statistical data visualization. Journal of Open Source Software, 6(60), 3021. https://doi.org/10.21105/joss.03021
  7. Wright, J. (2009). siunitx — A comprehensive ( SI ) units package. System, 1–60. https://www.ctan.org/pkg/siunitx
  8. Kalinke, F. (2020). Highlighting Pandas .to_latex() Output in Bold Face for Extreme Values. https://flopska.com/highlighting-pandas-to_latex-output-in-bold-face-for-extreme-values.html
  9. Zhu, H. (2020). Create Awesome LaTeX Table with knitr::kable and kableExtra. https://haozhu233.github.io/kableExtra/
  10. Huang, J.-B. (2018). Deep Paper Gestalt. CoRR, abs/1812.0. http://arxiv.org/abs/1812.08775

Suggested citation

If you would like to cite this work, here is a suggested citation in BibTeX format.

@misc{isaksson_2021,
  author="Isaksson, Martin",
  title="Martin's blog --- Create publication ready tables with Pandas",
  year=2021,
  url=https://blog.martisak.se/2021/04/10/publication_ready_tables/,
  note = "[Online; accessed 2021-04-12]"
}

Revisions