No description

Find a file

Andrei Borzenkov 1e743ddad5 Add information about size optimisations		2026-02-12 11:37:16 +04:00
.gitignore	init	2026-02-11 23:47:24 +04:00
converter.py	init	2026-02-11 23:47:24 +04:00
README.md	Add information about size optimisations	2026-02-12 11:37:16 +04:00

Cleanup pdf images using imagemagick and poppler-utils

Dependencies

; nix shell n\#imagemagick n\#poppler-utils

For viewing images in console:

; nix shell n\#timg

; mkdir output
; pdftoppm source.pdf "output/source"`

; for f in output/source-*.ppm; do echo "$f -> ${f/source/result}"; magick $f -white-threshold 53% ${f/source/result}; done

; magick output/result-*.ppm result.pdf

; nix run n\#kdePackages.okular -- result.pdf

; python ./converter.py

Choose pages interactively and select threshold via %

Apply changes by typing "yes" or "no"

; nix shell n\#ocrmypdf

It's recommended to convert ppm files to png before converting to the final result;

; for f in output/result-*.ppm; do echo "$f -> ${f/ppm/png}"; magic $f ${f/ppm/png}; done

And convert it to pdf:

; magick output/result-*.png result.pdf

That may reduce pdf up to x2 times!

Convertation to jpeg doesn't work as well and may even result in a larger output sizes.

; nix shell n\#exiftool
; exiftool -a -G1 blah.pdf