
As part of my build process I need to convert a nu...

Everything related to maintaining a paperless office running on free software.
Discussions include image processing tools like GIMP, ImageMagick, unpaper, pdf2djvu, etc.
Converting from XCF to PNG from the CLI -- is it possible?
As part of my build process I need to convert a nu...
The linked thread shows a couple bash scripts for using Gimp to export to another file format. Both scripts are broken for me. Perhaps they worked 14 years ago but not today.
Anyone got something that works?
No access via WebUI, App works fine
Hi. Since [email protected] seems dead... Maybe someone here can help me. I installed Paperless-ngx on TrueNAS Scale via the built-in Apps catalog (so Docker based). It seems to be working on the server side and even with an App from F-Droid, but login via Browser always leads to an error 500.
Any idea how to debug this? I could provide some logs if helpful.
Expats: when you receive documents that are not in your language, how do you handle them? (PDF vs DjVu, Tesseract, Argos, etc)
When I receive a non-English document, I scan it and run OCR (Tesseract). Then use pdftotext
to dump the text to a text file and run Argos Translate (a locally installed translation app). That gives me the text in English without a cloud dependency. What next?
Up until now, I save the file as (original basename)_en.txt. Then when I want to read the doc in the future I open that text file in emacs. But that’s not enough. I still want to see the original letter, so I open the PDF (or DjVu) file anyway.
That workflow is a bit cumbersome. So another option: use pdfjam --no-tidy
to import the PDF into the skeleton of LaTeX code, then modify the LaTeX to add a \pdfcomment
which then puts the English text in an annotation. Then the PDF merely needs to be opened and mousing over the annotation icon shows the English. This is labor intensive up front but it can be scripted.
Works great until pdf2djvu
runs on it. Both evince
and djview
render the document with annotation icons s
Q: How to produce a PDF with deliberate blank pages that don’t count when a print shop prints them
Suppose you are printing a book or some compilation of several shorter documents. You would do a duplex print (printing on both sides) but you don’t generally want the backside of the last page of a chapter/section/episode to contain the first page of the next piece.
In LaTeX we would add a \cleardoublepage
or \cleartooddpage
before every section. The compiler then only adds a blank page on an as-needed basis. It works as expected and prints correctly. But it’s a waste of money because the print shop counts blank pages as any other page.
My hack is this:
undefined
\newcommand{\tinyblankifeven}{{\KOMAoptions{paper=a8}\recalctypearea\thispagestyle{empty}\cleartooddpage}}
That inserts an A8 formatted blank page whenever a blank is added. That then serves as a marker for this shell script:
undefined
make_batches_pdf() { local -r src=$1 local start=1 local batch=1 while read pg do fn_dest=${src%.pdf}_b$(printf '%0.2d' $batch).pdf batch=$((batch+1))
Preparing a PDF for a lawyer and other orgs to use in court. PDF bookmarks, evidence labels, etc. Using LaTeX.
I’ve been using LaTeX to prepare legal documents in PDF format with a tree of PDF bookmarks like this:...
Store the URL of a PDF you downloaded in the PDF’s metadata
Create ~/.ExifTool_config
:
undefined
%Image::ExifTool::UserDefined = ( 'Image::ExifTool::XMP::xmp' => { # SRCURL tag (simple string, no checking, we specify the name explicitly so it stays all uppercase) SRCURL => { Name => 'SRCURL' }, PUBURL => { Name => 'PUBURL' }, # Text tag (can be specified in alternative languages) Text => { }, }, ); 1;
Then after fetching a PDF, run this:
undefined
$ exiftool -config ~/.ExifTool_config -xmp-xmp:srcurl="$URL" "$PDF"
To see the URL, simply run:
undefined
$ exiftool "$PDF"
It is a bit ugly that we need a complicated config file just to add an attribute to the metadata. But at least it works. I also have a PUBURL field to store URLs of PDFs I have published so I can keep track of where they were published.
Note that “srcurl” is an arbitrray identifier of my choosing, so use whatever tag suits you. I could not find a standard fieldname for this.
(PDF neutering) Not all PDFs are documents; some are apps! Insurance company sent me a form to sign as a PDF with JavaScript Java. Is it a tracker?
They emailed me a PDF. It opened fine with evince and looked like a simple doc at first. Then I clicked on a field in the form. Strangely, instead of simply populating the field with my text, a PDF note window popped up so my text entry went into a PDF note, which many viewers present as a sticky note icon.
If I were to fax this PDF, the PDF comments would just get lost. So to fill out the form I fed it to LaTeX and used the overpic pkg to write text wherever I choose. LaTeX rejected the file.. could not handle this PDF. Then I used the file
command to see what I am dealing with:
undefined
$ file signature_page.pdf signature_page.pdf: Java serialization data, version 5
WTF is that? I know PDF supports JavaScript (shitty indeed). Is that what this is? “Java” is not JavaScript, so I’m baffled. Why is java in a PDF? (edit: explainer on java serialization, and [some analysis](https://superuser.com/questions/1212097/can-a-val
How to obtain the density (DPI / PPI) of a PGM file -- anyone know? ImageMagick does not cut it.
Running this gives the geometry but not the density:
undefined
$ identify -verbose myfile.pgm | grep -iE 'geometry|pixel|dens|size|dimen|inch|unit'
There is also a “Pixels per second” attribute which means nothing to me. No density and not even a canvas/page dimension (which would make it possible to compute the density). The “Units” attribute on my source images are “undefined”.
Suggestions?
Safe enough for public webserver?
I just discovered this software and like it very much.
Would you consider it safe enough to use it with my personal documents on a public webserver?
PDF renders radically different between Adobe Acrobat® vs. evince & okular (GhostScript-based)
The linked doc is a PDF which looks very different in Adobe Acrobat than it does in evince and okular, which I believe are both based on the same GhostScript library.
So the question is, is there an alternative free PDF viewer that does not rely on the GhostScript library for rendering?
#AskFedi
TIFF → DjVu conversion produces bigger file from bilevel doc than color
I would like to get to the bottom of what I am doing wrong that leads to black and white documents having a bigger filesize than color.
My process for a color TIFF is like this:
① tiff2pdf
② ocrmypdf
③ pdf2djvu
Resulting color DjVu file is ~56k. When pdfimages -all
runs on the intermediate PDF file, it shows CCITT (fax) is inside.
My process for a black and white TIFF is the same:
① tiff2pdf
② ocrmypdf
③ pdf2djvu
Resulting black and white DjVu file is ~145k (almost 3× the color size). When pdfimages -all
runs on the intermediate PDF file, it shows a PNG file is inside. If I replace step ① with ImageMagick’s convert
, the first PDF is 10mb, but in the end the resulting djvu
file is still ~145k. And PNG is still inside the intermediate PDF.
I can get the bitonal (bilevel) image smaller by using cjb2 -clean
, which goes straight from TIFF to DjVu, but then I can’t OCR it due to the lack of PDF intermediate version. And the size is still bigger than t