Generating PDFs
Puppeteer and Playwright can be used to create PDFs from webpages. This opens up interesting automation scenarios for tasks such as archiving, generating invoices, writing manuals, books and more.
This article introduces this functionality and shows how we can customise the PDF to fit our needs.
# Generating a PDF file
After loading a page, we use the page.pdf()
command to convert it to a PDF.
Note that we need to pass the path
option to have the PDF file actually saved to disk.
WARNING
This feature is currently only supported in Chromium headless in both Puppeteer and Playwright.
# Tweaking the result
It is important to take a quick look at the official docs for page.pdf()
(Puppeteer (opens new window) or Playwright (opens new window)), as it is almost certain that we will want to tweak the appearance of our page in the resulting PDF.
In certain cases, our webpage might look significantly different in our PDF compared to our browser. Depending on the case, it can pay off to experiment with the following:
- We might need to set option
printBackground
to true in case graphical components appear to be missing in the generated PDF. - By default,
page.pdf()
will generate a PDF with adjusted colors for printing. Setting the CSS property-webkit-print-color-adjust: exact
will force rendering of the original colors. - Calling
page.emulateMedia('screen')
changes the CSS media type of the page. - Setting either
width
andheight
orformat
to the appropriate value might be needed for the page to be displayed optimally.
# Customising header and footer
We can also have custom headers and footers added to our pages, displaying values such as title, page number and more. Let's see how this looks on your favourite website (opens new window):
We are including the following template files for our header...
<html>
<head>
<style type="text/css">
#header {
padding: 0;
}
.content {
width: 100%;
background-color: #777;
color: white;
padding: 5px;
-webkit-print-color-adjust: exact;
vertical-align: middle;
font-size: 15px;
margin-top: 0;
display: inline-block;
}
.title {
font-weight: bold;
}
.date {
text-align:right;
}
</style>
</head>
<body>
<div class="content">
<span class="title"></span> -
<span class="date"></span>
<span class="url"></div>
</div>
</body>
</html>
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
...and footer:
<html>
<head>
<style type="text/css">
#footer {
padding: 0;
}
.content-footer {
width: 100%;
background-color: #777;
color: white;
padding: 5px;
-webkit-print-color-adjust: exact;
vertical-align: middle;
font-size: 15px;
margin-top: 0;
display: inline-block;
text-align: center;
}
</style>
</head>
<body>
<div class="content-footer">
Page <span class="pageNumber"></span> of <span class="totalPages"></span>
</div>
</body>
</html>
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
The first page of the generated PDF looks as follows:
TIP
Chromium sets a default padding for header and footer. You will need to override it (opens new window) in your CSS.
Run the above examples as follows:
# Further considerations
We can easily transform existing web pages into PDF format, just as we have shown in our example. An even more interesting use case is about generating a brand new document: now we can use our existing HTML and CSS skills to produce high-quality PDFs, often eliminating the need for LaTeX or similar tools.
See points 2 and 3 of the following section for practical examples of this approach.
# Further reading
- Pocket Admin's article on generating PDF from HTML (opens new window).
- Florian Mößle's guide to generating invoices with Puppeteer (opens new window)
- A great example of Puppeteer's PDF generation feature: Li Haoyi (opens new window)'s Hands On Scala (opens new window) book. See the build pipeline (opens new window) behind it.