Tables in scientific papers often look less than professional, and sometimes this can even get in the way of understanding the message. In this blog post we will use pandas
to automate making publication ready LaTeX tables that look great.
Introduction
Tufte argues that we should strive to have a high data to ink-ratio (Tufte, 1986) which means that we should strive to remove redundant graphical element that do not contribute to conveying our message. This applies to tables as well.
For typesetting tables in my scientific papers I use LaTeX with the booktabs
(Fear, 2020) package. Using booktabs
goes a long way towards making beautiful tables with a high data to ink-ratio, but it’s a manual process.
In this blog post we will explore using pandas
(pandas development team, 2020; Wes McKinney, 2010 ) and booktabs
for removing some unwanted ink from our tables and building a pipeline for generating and including the tables into our LaTeX papers.
Using pandas
to make a table
The first thing we need to do is to make a table from a dataset. We’ll look at the Iris dataset (Fisher, 1936) from the seaborn
(Waskom, 2021) Python library.
We could simply use the pandas
function to_latex()
to save a file containing the table in LaTeX format. pandas
requires booktabs
, but we can make this table even better with some simple tweaks.
First we want to specify the table column format and round the numbers to two decimals. Secondly, we want to highlight the maximum number in a column by making the numbers bold. And lastly, we want to make each column header bold.
Specifying the table format is easy using the siunitx
package (Wright, 2009). We set each of the number columns to S[table-format = 2.2]
.
Making the maximum value in each column bold requires a bit more work. (Kalinke, 2020) wrote an inspiring post that we make use of here. Since we are using siunitx
to specify the column format we use \bfseries
to make numbers bold and allow siunitx
to detect this by loading the package with \usepackage[round-mode=places,detect-weight=true,detect-inline-weight=math]{siunitx}
.
Column header titles should be bold and in title case, so we directly modify df.columns
to achieve this.
Since we added LaTeX tags to our table we must set escape
to False
in the to_latex
call.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
import seaborn as sns
import os
def bold_extreme_values(data, data_max=-1):
if data == data_max:
return "\\bfseries %s" % data
return data
if __name__ == "__main__":
# Load data and
# calculate mean of each column
df = (sns.load_dataset('iris')
.groupby("species")
.mean()
.reset_index()
)
# Specify in which columns to make the maximum bold
col_show_max = ["sepal_length", "sepal_width",
"petal_length", "petal_width"]
# Iterate through columns
for k in col_show_max:
df[k] = df[k].apply(
lambda data: bold_extreme_values(data, data_max=df[k].max()))
# Set column header to bold title case
df.columns = (df.columns.to_series()
.apply(lambda r: "\\textbf}".format(
r.replace("_", " ").title())))
# Write to file
with open(
os.path.splitext(
os.path.basename(__file__))[0] + ".tbl", "w") as f:
format = "l" + \
"@{\hskip 12pt}" +\
4*"S[table-format = 2.2]"
f.write(df.head()
.to_latex(index=False,
escape=False,
column_format=format)
)
At the end we are using the pandas
function to_latex()
to generate the LaTeX code and write the result to a file containing the tabular
environment. For this example, we have used seaborn==0.11.1
and pandas==1.2.3
.
Now we are ready to include the generated file into a LaTeX document.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
\documentclass[tikz,crop,convert={density=400,outext=.png}]{standalone}
\usepackage{booktabs}
\usepackage{etoolbox}
\usepackage[round-mode=places,detect-weight=true,detect-inline-weight=math]{siunitx}
\renewcommand\arraystretch{1.2}
\listfiles
\begin{document}
\begin{table}
\robustify\bfseries
\caption{A generated table}
\input{table.tbl}
\end{table}
\end{document}
We can compile this as a standalone figure into PNG and PDF by running pdflatex -shell-escape figure.tex
.
Automation
To automate this build, we can use the following Makefile
. The input files are table.py
and table.tex
, both of which are listed above.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
SOURCES=$(wildcard *.py)
PNG_OBJECTS=$(SOURCES:.py=.png)
PNG_I_OBJECTS=$(SOURCES:.py=-0.png)
PYTHON=pipenv run python3
LATEX=pdflatex
all: $(PNG_OBJECTS)
%.tbl: %.py
$(PYTHON) $<
%.png: %.tex %.tbl
$(LATEX) -shell-escape $<
convert $(basename $@)-0.png -flatten -trim +repage $(basename $@).png
clean:
-rm $(PNG_I_OBJECTS) $(PNG_OBJECTS)
-latexmk -C $(basename $(SOURCES))
.INTERMEDIATE: $(PNG_I_OBJECTS)
We want to create the file table.png
. To do this we start with running Python to generate the .tbl
file that we then include in table.tex
. Compiling table.tex
renders the table and saves it as a .png
. We get a .pdf
for free when using pdflatex
.
Related Work
(Kalinke, 2020) was the inspiration for this post. The method used herein to make numbers bold included code for formatting the numbers. In this work we use siunitx
instead to do the formatting.
In R we can use packages xtable
or kableExtra
to achieve similar results. In particular, kableExtra
is very capable and the documentation (Zhu, 2020) has many interesting examples.
The entire library of work by Edward Tufte is hugely inspirational to us. (Tufte, 1986) tells us not to put too much ink on the paper.
Conclusion
We have looked at how to make tables generated by pandas
to look more professional by using siunitx
and some tweaks. The Makefile
we created should go into the tables
directory of your manuscript so that you can use make -C tables all
as a dependency to your normal make report
target.
Easily digested tables makes it easier to understand the message we are trying to convey. In fact there is some evidence (Huang, 2018) that the visual appearance of a paper is important and that improving the paper gestalt reduces risk of getting a paper rejected.
References
- Tufte, E. R. (1986). The Visual Display of Quantitative Information. Graphics Press. https://www.edwardtufte.com/tufte/books_vdqi
- Fear, S. (2020). Publication quality tables in LaTeX. 1–18. https://ctan.org/pkg/booktabs
- pandas development team, T. (2020). pandas-dev/pandas: Pandas (latest). Zenodo. https://doi.org/10.5281/zenodo.3509134
- Wes McKinney. ( 2010 ). Data Structures for Statistical Computing in Python . In Stéfan van der Walt & Jarrod Millman (Eds.), Proceedings of the 9th Python in Science Conference (pp. 56–61 ). https://doi.org/ 10.25080/Majora-92bf1922-00a
- Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2), 179–188.
- Waskom, M. L. (2021). seaborn: statistical data visualization. Journal of Open Source Software, 6(60), 3021. https://doi.org/10.21105/joss.03021
- Wright, J. (2009). siunitx — A comprehensive ( SI ) units package. System, 1–60. https://www.ctan.org/pkg/siunitx
- Kalinke, F. (2020). Highlighting Pandas .to_latex() Output in Bold Face for Extreme Values. https://flopska.com/highlighting-pandas-to_latex-output-in-bold-face-for-extreme-values.html
- Zhu, H. (2020). Create Awesome LaTeX Table with knitr::kable and kableExtra. https://haozhu233.github.io/kableExtra/
- Huang, J.-B. (2018). Deep Paper Gestalt. CoRR, abs/1812.0. http://arxiv.org/abs/1812.08775
Suggested citation
If you would like to cite this work, here is a suggested citation in BibTeX format.