No description
- Python 100%
| .gitignore | ||
| converter.py | ||
| README.md | ||
Cleanup pdf images using imagemagick and poppler-utils
Dependencies
; nix shell n\#imagemagick n\#poppler-utils
For viewing images in console:
; nix shell n\#timg
Decompose pdf to ppm images
; mkdir output
; pdftoppm source.pdf "output/source"`
Run magick with default threshold:
; for f in output/source-*.ppm; do echo "$f -> ${f/source/result}"; magick $f -white-threshold 53% ${f/source/result}; done
Covert result images to pdf:
; magick output/result-*.ppm result.pdf
Okular can update pdf if you regenerate it:
; nix run n\#kdePackages.okular -- result.pdf
Regenerate individual pages
; python ./converter.py
Choose pages interactively and select threshold via %
Apply changes by typing "yes" or "no"
OCR resulting pdf
; nix shell n\#ocrmypdf
Size optimisations
It's recommended to convert ppm files to png before converting to the final result;
; for f in output/result-*.ppm; do echo "$f -> ${f/ppm/png}"; magic $f ${f/ppm/png}; done
And convert it to pdf:
; magick output/result-*.png result.pdf
That may reduce pdf up to x2 times!
Convertation to jpeg doesn't work as well and may even result in a larger output sizes.
Retrieve pdf metadata
; nix shell n\#exiftool
; exiftool -a -G1 blah.pdf