Converting HTML to PDF using wkhtmltopdf

I blogged a while back about delivering pages as PDF using PHP, and at the time DOMPDF seemed to be the best-of-breed package for converting HTML into PDF for the purposes of delivering PDF versions of web content.

However, I noted at the time that DOMPDF's last release was in July 2007, and it still doesn't look like being updated any time soon. The fundamental problem with packages like DOMPDF is that they tend to implement their own rendering engine. The thing is, HTML and CSS are both pretty huge now - writing a rendering engine that can cope with all the different combinations is a huge task, so projects like DOMPDF end up missing out important bits of functionality.

A better approach would be to use an existing rendering engine from a browser, and then build a binary around it that can take a website as input and produce a PDF as output. That way you can get results consistent with how browsers would print a page and if you pick the right engine you'll not have to keep up with any changes to HTML standards, the engine developers will do that for you.

This is essentially the approach wkhtmltopdf takes: it extracts the open-sourced Webkit renderer used inside browsers like Safari and Chrome and bundles it up into a Linux CLI application which produces some pretty impressive results.

I thought I'd jump right in and start by compiling it on my Debian webserver. The wkhtmltopdf site has some instructions for building it on Ubuntu, which I thought were worth a try. The basic procedure was as follows:

#apt-get update
#apt-get install libqt4-dev qt4-dev-tools build-essential cmake

#svn checkout http://wkhtmltopdf.googlecode.com/svn/trunk/ wkhtmltopdf
#cd wkhtmltopdf
#cmake -D CMAKE_INSTALL_PREFIX=/usr .
#make
#sudo make install

In my case, this installed a terrifying amount of new packages to my server, but everything went very smoothly. I was left with a binary in /usr/bin and ploughed right in!

#wkhtmltopdf http://ciaranmcnulty.com /tmp/ciaranmcnulty.pdf
wkhtmltopdf: cannot connect to X server

Argh. The rendering engine depends on there being a GUI running on the machine so it can do cool things like generate graphics, render fonts and so forth. A typical webserver won't be running X, but luckily there are ways around it.

One such way is xvfb, or the X Virtual Frame Buffer. This is a handy bit of code that basically runs an X instance but without a lot of the overheads. You can create a temporary X buffer and run a command in it using the xvfb-run binary, the benefit of which is that the x instance gets thrown away afterwards. I installed xvfb and then invoked it as follows:

#apt-get install vfb
#xvfb-run -a -s "-screen 0 640x480x16" wkhtmltopdf --dpi 200 
  --page-size A4 http://ciaranmcnulty.com /tmp/ciaranmcnulty.pdf

The options should be fairly self-explanatory, the key things to note are that -a makes xvfb pick an unused display number (to avoid collisions) and -screen starts up the virtual framebuffer with a display with the correct bit depth and dimensions.

The results are fairly good, certainly better than PHPDOM would generate given the same input. My site layout uses a fair bit of floating and absolute positioning, and the PDF came out exactly as I'd expect:

Website PDF

It's important to note that this isn't a bitmap, the text in the PDF is still 'text'.

A quick dig around showed that to print the backgrounds I'd need to have Qt4.5 installed, something I wasn't really prepared to risk my server for. However, I thought I'd quickly try doing what I should have in the first place. The wkhtml project provides a linux binary that's statically compiled against Qt.

I downloaded this binary and gave it a whirl. The results were much better:

Website PDF with backgrounds

Frankly I think this is a great rendition of the page, and certainly good enough for an autogenerated PDF on a website. A bit of further investigation and experimentation has left me pretty impressed with the breadth of CSS print functionality webkit can support.

The next step for me is going to be to try and replace some of the DOMPDF installations in some of my smaller sites, and see how it performs under load. The time taken to generate a PDF is pretty high, and I've not really checked out how xvfb is with concurrency so I'd hesitate to throw it onto a production site straight away, but it'll be my first port of call next time I want to do something with a PDF.

Bookmark and Share

Comments

1.

Creating a PDF from a web page is not that common an occurrence for me. However this looks great and I am racking my brains to think of somewhere cool to implement this!

Russell
9th April 2009, 21:45

2.

I think that creating a PDF from an existing Web page is only one use case for this. You could use it in *any* situation where you want to generate a PDF in some dynamic, automated way such as via PHP. Rather than try to figure out the PDF format, or the workings of some arcane library, you can generate a page of HTML (which you already know, and is well documented) and then convert that.

It'll be interesting to see how well this plays with concurrency and load etc. I'll look forward to a follow-up post somewhere down the line!

Simon Harris
9th April 2009, 22:14

3.

Yeah as Simon says, there are really two use cases - one is PDF versions of existing pages, but the other is generating PDFs for different uses using HTML as your templating system.

The idea that your PDF templates can be in HTML and sit in your application alongside page templates, and basically be edited by the same people, is pretty attractive.

Ciaran McNulty
9th April 2009, 23:05

4.

thanks for help

Kowalikus
17th February 2010, 12:56

5.

Any chance you could elaborate on installing the static binary? What commands did you use to get this to work?

Thanks for the article! Very helpful, I just gotta get it to work now lol.

John
22nd March 2010, 20:48

6.

http://wkhtmltopdf.googlecode.com/files/wkhtmltopdf-0.9.5-static-i386.tar.bz2
That is a statically compiled binary for i386. Just download it, untar it and run it.

Catalin

nc3b
23rd March 2010, 06:05

7.

@John - It's a case of unarchiving the distributed binary that Catalin has linked to and executing the 'wkhtmltopdf' binary iniside it, adding it to the path if necessary.

@Catalin - Thanks for the link!

Ciaran McNulty
24th March 2010, 20:48

8.

muchas gracias co

eva
17th May 2010, 12:37

9.

A few notes about the installation:
X package is called xvfb but not vfb.
Therefore the correct command is:
#apt-get install xvfb

Also, you will need libxrender1 package for the static build:
#apt-get install libxrender1
Since you don't need a separate directory (static version is a single executable) I suggest you put it in /usr/local/bin or somewhere with an existing path line.

Stormy
14th June 2010, 11:35

10.

More notes about the options:
First of all you don't need -s "-screen 0 640x480x16" since it's the default mode for xvfb and it seems screen resolution/mode does not affect wkhtmltopdf output.

What the author forgot to mention is that by default, xvfb does NOT come with any standard (web) fonts so your output PDF will be with a single font.
To rectify this - you need to install:
#apt-get install msttcorefonts
Then point the path in xvfb-run server options, e.g.:
#xvfb-run -a -s "-fp /usr/share/fonts/truetype/msttcorefonts" wkhtmltopdf http://www.angelfire.com/fl5/html-tutorial/fontlist.htm test.pdf

Another issue is the default page size for wkhtmltopdf. In my case it was somewhere around 30" which was not exactly a standard A4.
If you run onto this, set the page size (in mm) manually with (i.e. for A4 portrait):
--page-width 210 --page-height 297
I suggest you do the same in the html code just in case.

On the side note - althou wkhtmltopdf does give you the option to switch to print media style, the behavour is not a 100% compatible: e.g. in case you span huge html tables over several pages it will not repeat THEAD and TFOOT as by HTML 4 Standart.

Thankfully, it has a powerfull support for headers and footers via HTML/JS code which should satisfy all your needs (if you're able to code them)...

Stormy
14th June 2010, 17:42

11.

Thanks for the comments, Stormy.

Interesting note about the fonts - I didn't have that issue but maybe my distro had the package installed already for some reason.

Ciaran McNulty
21st June 2010, 11:03

12.

Cool, I'm kinda new to all of this. I really would like to know how do i start.
I have an in-house server with an app on it that is supposed to generate reports. Can u gimme a step by step approach?
Thank u so much in advance.
my email is oluwamayowa@steepe.org

Steepe
26th June 2010, 01:41

13.

I m facing a issue with generated PDF size. There is a huge difference between size of the PDF generated in Windows and Linux platforms.
I think this is because of embedding wrong fonts.
The detailed problem statement can be found in http://stackoverflow.com/questions/3193805/wkhtmltopdf-generated-pdf-size-issues-in-cent-os-4-6

Will appreciate if anybody can provide some help.

Thanks,
Pradeep

Pradeep Pant
12th July 2010, 09:16

14.

Last time I looked at xvfb-run it was a shell script that paused for 3 seconds waiting for the X server to start. You might find that this three second pause is what takes the majority of the time you are seeing for the PDF to be generated. I'd recommend not using the xvfb-run shell script and instead rapidly polling for the socket to be created in the /tmp/.X11 directory. When this appears the X server is ready to accept client connections and you can invoke wkhtmltopdf.

Actually, you have probably realized by now that the X server is not even required if you use the static binary (unless you are using web pages with certain plugins like flash). Try skipping the xvfb-run wrapper entirely.

Anonymous
16th September 2010, 17:11

15.

THANK YOU!!!!!

Your install procedure was the ONLY one that worked for me for the Amazon Ubuntu AMIs for EC2 by Canonical (http://alestic.com/2010/08/ec2-ami-canonical).

Yours worked perfectly out of the box. I was pulling my freakin hair out for 6 hrs pior!!

Eric
21st November 2010, 16:41

16.

Another tool for converting html to pdf (or html to excel) that's worth checking out is DocRaptor. http://docraptor.com/

Brennan
21st April 2011, 19:04

17.

Did anyone get over the font-spacing issue when using QT rather than X?

S.B
12th July 2011, 11:55

18.

Hi,

I am using this tool to create a PDF.

My web page has dimensions of: 972px X 687px (29.7cm/13.5" at 72dpi, 21.0cm/9.55" at 72dpi).

I put a border on my centralized div to check the edges.

I used the following syntax to call the PDF tool:

wkhtmltopdf runme.html runmeA4L.pdf --page-size A4 --orientation landscape --dpi 300

The resulting PDF says its properties are: 11.69" x 8.26" which is NOT A4!

What is going on?

I am viewing the PDF in Foxit Reader 3.3

Many thanks for your help
Jeremy C.

Jeremy
20th September 2011, 14:34

19.

...my mistake, is 2.5cm = 1" :)
Dimensions are correct in the PDF to be 8.26" x 11.69" :)

Jeremy
20th September 2011, 14:44

20.

...ok, having fixed my silly mistake I now have a web page size of 841px (11.69"@72dpi) by 594px (8.26"@72dpi).

And still the resulting PDF has a border in it covering roughly half the page...It looks like the conversion has squashed the page...

Any ideas?

Many thanks again
Jeremy C.

Jeremy
20th September 2011, 14:51

Commenting is currently disabled