PDF Page Extractor Using Python

2025-11-27 python pdf extractor pypdf2 utilities automation pdf-tools beginner-project

A Python-based PDF page extraction tool built using the PyPDF2 library. It allows users to select a specific range of pages from any PDF file and save them as a new PDF. Ideal for splitting documents, study notes, and extracting important sections easily.

Explanation

A PDF Page Extractor is a utility that allows users to extract a specific range of pages from a large PDF file and save them as a new PDF document. This is especially useful when you need only selected sections from large books, reports, question papers, or e-books.

This program uses Python's PyPDF2 library to read a PDF file, select a range of pages, and write those pages into a new PDF file. The program supports custom start and end page numbers and ensures that the selected pages are extracted accurately.

Requirements

  • pip install PyPDF2

    Installs the PyPDF2 library required for reading and writing PDF pages.

  • Python version 3 or higher
  • A valid PDF file to extract pages from
  • User must provide start and end page numbers
  • Generated output PDF is saved with user-given filename

Code Explanation

from PyPDF2 import PdfReader, PdfWriter

Imports the required classes for reading an existing PDF and writing a new one.

print("=== PDF Splitter ===")

Displays a title to make the program more user-friendly.

input_pdf = input("Enter input PDF path: ").strip() or "output.pdf"

Takes the input PDF file path from the user. Uses a default file if left blank.

start_page = int(input("Enter start page number: ").strip())

Asks the user for the first page to extract.

end_page = int(input("Enter end page number: ").strip())

Asks the user for the final page to extract.

output_pdf = input("Enter output PDF name: ").strip() or "outputs.pdf"

Takes the output PDF filename from the user or uses a default filename.

reader = PdfReader(input_pdf)
writer = PdfWriter()

Loads the input PDF and prepares a new PDF writer object.

start = start_page - 1
end = end_page

Converts user page numbers to zero-based index for PyPDF2.

for page_num in range(start, end):
    writer.add_page(reader.pages[page_num])

Adds each selected page from the input PDF into the new output PDF.

with open(output_pdf, "wb") as out_pdf:
    writer.write(out_pdf)

Saves the extracted pages into a new PDF file.

print(f"PDF split successfully! Saved as: {output_pdf}")

Prints a success message after generating the PDF.

Key Points

  • Extracts only selected pages from a PDF file.
  • Uses Python’s lightweight PyPDF2 library.
  • Zero-based indexing is used internally for page selection.
  • User chooses start page, end page, input file, and output filename.
  • Easy to automate splitting or cropping PDFs for study/work.

Full Python Program

from PyPDF2 import PdfReader, PdfWriter
print("=== PDF Splitter ===")

input_pdf = input("Enter input PDF path: ").strip() or "output.pdf"
start_page = int(input("Enter start page number: ").strip())
end_page = int(input("Enter end page number: ").strip())
output_pdf = input("Enter output PDF name (e.g., output.pdf): ").strip() or "outputs.pdf"

reader = PdfReader(input_pdf)
writer = PdfWriter()

start = start_page - 1
end = end_page

for page_num in range(start, end):
    writer.add_page(reader.pages[page_num])

with open(output_pdf, "wb") as out_pdf:
    writer.write(out_pdf)

print(f"PDF split successfully! Saved as: {output_pdf}")

Output :

> py pdfExtractor.py

=== PDF Splitter ===
Enter input PDF path: sample.pdf
Enter start page number: 2
Enter end page number: 5
Enter output PDF name (e.g., output.pdf): extracted.pdf

PDF split successfully! Saved as: extracted.pdf
1