PDF Page Extractor Using Python

2025-11-27 python pdf extractor pypdf2 utilities automation pdf-tools beginner-project

A Python-based PDF page extraction tool built using the PyPDF2 library. It allows users to select a specific range of pages from any PDF file and save them as a new PDF. Ideal for splitting documents, study notes, and extracting important sections easily.

Explanation

A PDF Page Extractor is a utility that allows users to extract a specific range of pages from a large PDF file and save them as a new PDF document. This is especially useful when you need only selected sections from large books, reports, question papers, or e-books.

This program uses Python's PyPDF2 library to read a PDF file, select a range of pages, and write those pages into a new PDF file. The program supports custom start and end page numbers and ensures that the selected pages are extracted accurately.

Requirements

pip install PyPDF2

Installs the PyPDF2 library required for reading and writing PDF pages.
Python version 3 or higher
A valid PDF file to extract pages from
User must provide start and end page numbers
Generated output PDF is saved with user-given filename

Code Explanation

from PyPDF2 import PdfReader, PdfWriter

Imports the required classes for reading an existing PDF and writing a new one.

print("=== PDF Splitter ===")

Displays a title to make the program more user-friendly.

input_pdf = input("Enter input PDF path: ").strip() or "output.pdf"

Takes the input PDF file path from the user. Uses a default file if left blank.

start_page = int(input("Enter start page number: ").strip())

Asks the user for the first page to extract.

end_page = int(input("Enter end page number: ").strip())

Asks the user for the final page to extract.

output_pdf = input("Enter output PDF name: ").strip() or "outputs.pdf"

Takes the output PDF filename from the user or uses a default filename.

reader = PdfReader(input_pdf)
writer = PdfWriter()

Loads the input PDF and prepares a new PDF writer object.

start = start_page - 1
end = end_page

Converts user page numbers to zero-based index for PyPDF2.

for page_num in range(start, end):
writer.add_page(reader.pages[page_num])

Adds each selected page from the input PDF into the new output PDF.

with open(output_pdf, "wb") as out_pdf:
writer.write(out_pdf)

Saves the extracted pages into a new PDF file.

print(f"PDF split successfully! Saved as: {output_pdf}")

Prints a success message after generating the PDF.

Key Points

Extracts only selected pages from a PDF file.
Uses Python’s lightweight PyPDF2 library.
Zero-based indexing is used internally for page selection.
User chooses start page, end page, input file, and output filename.
Easy to automate splitting or cropping PDFs for study/work.

Full Python Program

from PyPDF2 import PdfReader, PdfWriter
print("=== PDF Splitter ===")

input_pdf = input("Enter input PDF path: ").strip() or "output.pdf"
start_page = int(input("Enter start page number: ").strip())
end_page = int(input("Enter end page number: ").strip())
output_pdf = input("Enter output PDF name (e.g., output.pdf): ").strip() or "outputs.pdf"

reader = PdfReader(input_pdf)
writer = PdfWriter()

start = start_page - 1
end = end_page

for page_num in range(start, end):
    writer.add_page(reader.pages[page_num])

with open(output_pdf, "wb") as out_pdf:
    writer.write(out_pdf)

print(f"PDF split successfully! Saved as: {output_pdf}")

Output :

> py pdfExtractor.py

=== PDF Splitter ===
Enter input PDF path: sample.pdf
Enter start page number: 2
Enter end page number: 5
Enter output PDF name (e.g., output.pdf): extracted.pdf

PDF split successfully! Saved as: extracted.pdf