返回部落格

Convert IPYNB to PDF in Python: A Programmatic Guide with nbconvert

Beyond the command line: convert Jupyter Notebooks to PDF directly in Python using nbconvert's API. Includes batch conversion, custom templates, and embedding in Flask/Django apps.

  • #python
  • #nbconvert
  • #api
  • #guide

The jupyter nbconvert command-line tool is great for one-off conversions, but sometimes you need to convert notebooks programmatically — batch processing, embedded in a web app, or as part of a larger Python pipeline. nbconvert exposes a full Python API for exactly this.

This guide covers the practical patterns: single conversion, batch conversion, custom templates, execution, and integration with Flask/Django.

Setup

pip install nbconvert nbformat
# For webpdf (no LaTeX needed)
pip install "nbconvert[webpdf]"
playwright install chromium

You'll also want nbformat for reading notebook files directly.

Pattern 1 — Single Notebook Conversion

The simplest programmatic conversion:

import nbformat
from nbconvert import PDFExporter

# Load the notebook
with open("notebook.ipynb") as f:
    nb = nbformat.read(f, as_version=4)

# Configure the exporter
pdf_exporter = PDFExporter()
pdf_exporter.latex_command = "xelatex {filename} -quiet"  # Unicode-safe

# Convert
body, resources = pdf_exporter.from_notebook_node(nb)

# Save
with open("notebook.pdf", "wb") as f:
    f.write(body)

The body is the PDF as bytes; resources contains metadata and any side files (like embedded images extracted separately).

Pattern 2 — Using webpdf (No LaTeX)

If you don't have LaTeX installed, use WebPDFExporter:

import nbformat
from nbconvert import WebPDFExporter

with open("notebook.ipynb") as f:
    nb = nbformat.read(f, as_version=4)

exporter = WebPDFExporter()
body, resources = exporter.from_notebook_node(nb)

with open("notebook.pdf", "wb") as f:
    f.write(body)

This uses headless Chromium under the hood. Slower than HTML export but much faster than LaTeX.

Pattern 3 — Executing the Notebook During Conversion

If the notebook doesn't have outputs saved, execute it as part of conversion:

import nbformat
from nbconvert import PDFExporter
from nbconvert.preprocessors import ExecutePreprocessor

with open("notebook.ipynb") as f:
    nb = nbformat.read(f, as_version=4)

# Execute first
executor = ExecutePreprocessor(timeout=600, kernel_name="python3")
executor.preprocess(nb, {"metadata": {"path": "."}})

# Then convert
exporter = PDFExporter()
body, resources = exporter.from_notebook_node(nb)

with open("notebook.pdf", "wb") as f:
    f.write(body)

Set timeout to avoid hanging on cells that wait for input or have infinite loops.

Pattern 4 — Batch Conversion

Convert every notebook in a directory:

import nbformat
from nbconvert import WebPDFExporter
from pathlib import Path

exporter = WebPDFExporter()

for nb_path in Path("notebooks").glob("*.ipynb"):
    with open(nb_path) as f:
        nb = nbformat.read(f, as_version=4)

    try:
        body, _ = exporter.from_notebook_node(nb)
        out_path = nb_path.with_suffix(".pdf")
        with open(out_path, "wb") as f:
            f.write(body)
        print(f"Converted: {nb_path.name}")
    except Exception as e:
        print(f"Failed: {nb_path.name}: {e}")

Wrap each conversion in a try/except so one bad notebook doesn't kill the batch.

Pattern 5 — Custom Templates

nbconvert uses Jinja2 templates. You can customize them to change the PDF's appearance:

import nbformat
from nbconvert import PDFExporter
from nbconvert.templateengine import TemplateExporter

with open("notebook.ipynb") as f:
    nb = nbformat.read(f, as_version=4)

# Use a custom template
exporter = PDFExporter(template_name="classic")
# Or point to a custom template file
# exporter.template_file = "my_template.tpl"

body, resources = exporter.from_notebook_node(nb)

with open("notebook.pdf", "wb") as f:
    f.write(body)

Built-in templates include classic, lab, and html. You can write your own by extending one of these.

Pattern 6 — Excluding Specific Cells

Use cell tags to exclude cells from the PDF. Tag a cell with remove_cell in the notebook editor, then:

import nbformat
from nbconvert import PDFExporter

# This is the default behavior — cells tagged 'remove_cell' are excluded
exporter = PDFExporter()

with open("notebook.ipynb") as f:
    nb = nbformat.read(f, as_version=4)

body, _ = exporter.from_notebook_node(nb)

with open("notebook.pdf", "wb") as f:
    f.write(body)

Other useful tags: remove_input (hide code, keep output), remove_output (hide output, keep code).

Pattern 7 — Integration with Flask

Expose conversion as a web endpoint:

# app.py
from flask import Flask, request, send_file
import nbformat
from nbconvert import WebPDFExporter
import io
import tempfile

app = Flask(__name__)

@app.route("/convert", methods=["POST"])
def convert():
    if "notebook" not in request.files:
        return "No notebook uploaded", 400

    file = request.files["notebook"]
    nb = nbformat.reads(file.read().decode("utf-8"), as_version=4)

    exporter = WebPDFExporter()
    body, _ = exporter.from_notebook_node(nb)

    return send_file(
        io.BytesIO(body),
        mimetype="application/pdf",
        as_attachment=True,
        download_name="converted.pdf",
    )

if __name__ == "__main__":
    app.run(debug=True)

Test with:

curl -F "[email protected]" http://localhost:5000/convert -o output.pdf

Pattern 8 — Integration with Django

A Django view doing the same thing:

# views.py
from django.http import HttpResponse
import nbformat
from nbconvert import WebPDFExporter

def convert_notebook(request):
    if request.method != "POST":
        return HttpResponse(status=405)

    uploaded = request.FILES.get("notebook")
    if not uploaded:
        return HttpResponse("No notebook", status=400)

    nb = nbformat.reads(uploaded.read().decode("utf-8"), as_version=4)
    exporter = WebPDFExporter()
    body, _ = exporter.from_notebook_node(nb)

    response = HttpResponse(body, content_type="application/pdf")
    response["Content-Disposition"] = 'attachment; filename="converted.pdf"'
    return response

Pattern 9 — Asynchronous Batch with Celery

For high-volume conversion, offload to a worker:

# tasks.py
from celery import Celery
import nbformat
from nbconvert import WebPDFExporter

app = Celery("tasks", broker="redis://localhost:6379")

@app.task
def convert_notebook(nb_path, output_path):
    with open(nb_path) as f:
        nb = nbformat.read(f, as_version=4)

    exporter = WebPDFExporter()
    body, _ = exporter.from_notebook_node(nb)

    with open(output_path, "wb") as f:
        f.write(body)

    return output_path

This lets your web app return immediately and email the user when the PDF is ready.

Pattern 10 — Streaming Conversion for Large Notebooks

For huge notebooks, convert cell-by-cell to avoid memory spikes:

import nbformat
from nbconvert import HTMLExporter
from nbconvert.preprocessors import ClearOutputPreprocessor

with open("huge.ipynb") as f:
    nb = nbformat.read(f, as_version=4)

# Process in chunks
exporter = HTMLExporter()

# Convert to HTML first (lighter than PDF)
body, _ = exporter.from_notebook_node(nb)

# Then print HTML to PDF via headless browser in chunks
# (Implementation depends on your browser automation tool)

Performance Tips

  1. Reuse exporter instances. Creating a new PDFExporter for every conversion is expensive — it reloads templates. Cache it.

    # Module-level singleton
    _exporter = None
    def get_exporter():
        global _exporter
        if _exporter is None:
            _exporter = WebPDFExporter()
        return _exporter
    
  2. Use WebPDFExporter over PDFExporter. No LaTeX dependency, faster, and the output is just as good for most use cases.

  3. Set timeouts. Notebooks with input() calls or infinite loops will hang forever. Always use ExecutePreprocessor(timeout=600).

  4. Process in parallel. For batch conversion, use multiprocessing.Pool:

    from multiprocessing import Pool
    
    def convert_one(nb_path):
        # ...conversion logic...
    
    with Pool(processes=4) as pool:
        pool.map(convert_one, Path("notebooks").glob("*.ipynb"))
    
  5. Watch memory with LaTeX. PDFExporter launches a separate xelatex process per conversion. Under load, this can OOM your server. WebPDFExporter is gentler.

Conclusion

The nbconvert Python API unlocks everything from simple batch scripts to fully integrated web apps. For most use cases, WebPDFExporter is the right choice — no LaTeX dependency, fast, and produces clean output. Reach for PDFExporter only when you need maximum fidelity or LaTeX-specific features like custom document classes.

Once you've internalized the patterns above, you can stop thinking about "converting notebooks" as a manual step and start treating it as just another function call in your pipeline.

Convert IPYNB to PDF in Python: A Programmatic Guide with nbconvert | IPYNB 轉 PDF 轉換器