No description
Find a file
2026-02-12 11:37:16 +04:00
.gitignore init 2026-02-11 23:47:24 +04:00
converter.py init 2026-02-11 23:47:24 +04:00
README.md Add information about size optimisations 2026-02-12 11:37:16 +04:00

Cleanup pdf images using imagemagick and poppler-utils

Dependencies

; nix shell n\#imagemagick n\#poppler-utils

For viewing images in console:

; nix shell n\#timg

Decompose pdf to ppm images

; mkdir output
; pdftoppm source.pdf "output/source"`

Run magick with default threshold:

; for f in output/source-*.ppm; do echo "$f -> ${f/source/result}"; magick $f -white-threshold 53% ${f/source/result}; done

Covert result images to pdf:

; magick output/result-*.ppm result.pdf

Okular can update pdf if you regenerate it:

; nix run n\#kdePackages.okular -- result.pdf

Regenerate individual pages

; python ./converter.py

Choose pages interactively and select threshold via %

Apply changes by typing "yes" or "no"

OCR resulting pdf

; nix shell n\#ocrmypdf

Size optimisations

It's recommended to convert ppm files to png before converting to the final result;

; for f in output/result-*.ppm; do echo "$f -> ${f/ppm/png}"; magic $f ${f/ppm/png}; done

And convert it to pdf:

; magick output/result-*.png result.pdf

That may reduce pdf up to x2 times!

Convertation to jpeg doesn't work as well and may even result in a larger output sizes.

Retrieve pdf metadata

; nix shell n\#exiftool
; exiftool -a -G1 blah.pdf