Skip to content

About

想要将HTML转为PDF,可以使用前端的插件,但身为一个Python狗,前端又菜的不要不要的,只能想招了.....

经过一番如此这般后,我决定使用wkhtmltopdf这个非常强大的工具。

又经过一番如此这般后,我准备使用wkhtmltopdf封装的Python包PDFKit

install wkhtmltopdf

https://www.cnblogs.com/Neeo/articles/11566990.html

install pdfkit

pip install pdfkit

usage wkhtmltopdf

API说明

我们常用PDFKit的三个API:

  • from_url:将远程URL页面导出为PDF。
  • from_file:将HTML文件导出为PDF。
  • from_string:将字符串导出为PDF。
python
import pdfkit
 
pdfkit.from_url('https://www.google.com.hk','out1.pdf')   
pdfkit.from_file('123.html','out2.pdf')  
pdfkit.from_string('Hello!','out3.pdf')

from_url

python
def from_url(url, output_path, options=None, toc=None, cover=None,
             configuration=None, cover_first=False):
    """
    Convert file of files from URLs to PDF document

    :param url: 将一个或多个url页面导出PDF
    :param output_path: 导出PDF的文件路径,如果为False,将以字符串形式返回。
    :param options: 可选的 options参数,比如设置编码
    :param toc: (可选)为PDF文件生成目录
    :param cover: (可选),使用HTML文件作为封面。它会带页眉和页脚的TOC之前插入
    :param configuration: (可选) 一些配置,来自 pdfkit.configuration.Configuration()
    :param configuration_first: (optional) if True, cover always precedes TOC
    Returns: True on success
    """
    r = PDFKit(url, 'url', options=options, toc=toc, cover=cover,
               configuration=configuration, cover_first=cover_first)

    return r.to_pdf(output_path)

示例:

python
import pdfkit

# 需要指定wkhtmltopdf.exe的路径,就算你添加了path.....
config_pdf = pdfkit.configuration(wkhtmltopdf=r'C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe')
pdfkit.from_url(url='https://www.cnblogs.com/Neeo/articles/11566990.html', output_path='p3.pdf', configuration=config_pdf)

from_file

python
def from_file(input, output_path, options=None, toc=None, cover=None, css=None,
              configuration=None, cover_first=False):
    """
    Convert HTML file or files to PDF document

    :param input: path to HTML file or list with paths or file-like object
    :param output_path: path to output PDF file. False means file will be returned as string.
    :param options: (optional) dict with wkhtmltopdf options, with or w/o '--'
    :param toc: (optional) dict with toc-specific wkhtmltopdf options, with or w/o '--'
    :param cover: (optional) string with url/filename with a cover html page
    :param css: (optional) string with path to css file which will be added to a single input file
    :param configuration: (optional) instance of pdfkit.configuration.Configuration()
    :param configuration_first: (optional) if True, cover always precedes TOC

    Returns: True on success
    """

    r = PDFKit(input, 'file', options=options, toc=toc, cover=cover, css=css,
               configuration=configuration, cover_first=cover_first)

    return r.to_pdf(output_path)

示例:

python
import pdfkit

config_pdf = pdfkit.configuration(wkhtmltopdf=r'C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe')
pdfkit.from_file(input='h.html', output_path='p2.pdf', configuration=config_pdf)

可以有多个文件:

python
import pdfkit

options = {
    "encoding": "UTF-8",
    "custom-header": [('Accept-Encoding', 'gzip')],
    'page-size': 'Letter',
    'margin-top': '0.75in',
    'margin-right': '0.75in',
    'margin-bottom': '0.75in',
    'margin-left': '0.75in',
    # 'encoding': "UTF-8",
    'no-outline': False
}

config_pdf = pdfkit.configuration(wkhtmltopdf=r'C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe')
pdfkit.from_file(input=['h.html', 'w.html'], output_path='p2.pdf', configuration=config_pdf, options=options)

可以添加css文件:

python
css='example.css'
pdfkit.from_file('file.html', options=options, css=css)
# Multiple CSS files
css=['example.css','example2.css']   
pdfkit.from_file('file.html', options=options, css=css)

from_string

python
def from_string(input, output_path, options=None, toc=None, cover=None, css=None,
                configuration=None, cover_first=False):
    """
    Convert given string or strings to PDF document

    :param input: 带有所需文本的字符串。可以是原始文本或html文件
    :param output_path: 输出PDF文件的路径。False表示文件将作为字符串返回
    :param options: (optional) dict with wkhtmltopdf options, with or w/o '--'
    :param toc: (optional) dict with toc-specific wkhtmltopdf options, with or w/o '--'
    :param cover: (optional) string with url/filename with a cover html page
    :param css: (optional) 将添加到输入字符串的css文件的路径
    :param configuration: (optional) instance of pdfkit.configuration.Configuration()
    :param configuration_first: (optional) if True, cover always precedes TOC

    Returns: True on success
    """

    r = PDFKit(input, 'string', options=options, toc=toc, cover=cover, css=css,
               configuration=configuration, cover_first=cover_first)

    return r.to_pdf(output_path)

示例:

python
import pdfkit

config_pdf = pdfkit.configuration(wkhtmltopdf=r'C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe')

pdfkit.from_string(input='hello pdfkit wkhtmltopdf', output_path='p4.pdf', configuration=config_pdf

欢迎斧正,that's all see also:PDF之pdfkit | Python抓取网页并保存为PDF