Paperless office; document/image processing

Everything related to maintaining a paperless office running on free software.

Discussions include image processing tools like GIMP, ImageMagick, unpaper, pdf2djvu, etc.

Members

Posts

Active Today

Created

1 yr. ago

Sort

Paperless office; document/image processing @sopuli.xyz
evenwicht @lemmy.sdf.org
5mo ago

Converting from XCF to PNG from the CLI -- is it possible?

overflow.manganiello.tech How to convert XCF to PNG using GIMP from the command-line? | AnonymousOverflow
As part of my build process I need to convert a nu...

The linked thread shows a couple bash scripts for using Gimp to export to another file format. Both scripts are broken for me. Perhaps they worked 14 years ago but not today.
Anyone got something that works?

0
Paperless office; document/image processing @sopuli.xyz
alerich @discuss.tchncs.de
6mo ago

No access via WebUI, App works fine

Hi. Since [email protected] seems dead... Maybe someone here can help me. I installed Paperless-ngx on TrueNAS Scale via the built-in Apps catalog (so Docker based). It seems to be working on the server side and even with an App from F-Droid, but login via Browser always leads to an error 500.
Any idea how to debug this? I could provide some logs if helpful.

0
Paperless office; document/image processing @sopuli.xyz
LibreMonk @linkage.ds8.zone
6mo ago

Expats: when you receive documents that are not in your language, how do you handle them? (PDF vs DjVu, Tesseract, Argos, etc)

When I receive a non-English document, I scan it and run OCR (Tesseract). Then use pdftotext to dump the text to a text file and run Argos Translate (a locally installed translation app). That gives me the text in English without a cloud dependency. What next?
Up until now, I save the file as (original basename)_en.txt. Then when I want to read the doc in the future I open that text file in emacs. But that’s not enough. I still want to see the original letter, so I open the PDF (or DjVu) file anyway.
That workflow is a bit cumbersome. So another option: use pdfjam --no-tidy to import the PDF into the skeleton of LaTeX code, then modify the LaTeX to add a \pdfcomment which then puts the English text in an annotation. Then the PDF merely needs to be opened and mousing over the annotation icon shows the English. This is labor intensive up front but it can be scripted.
Works great until pdf2djvu runs on it. Both evince and djview render the document with annotation icons s

0
Paperless office; document/image processing @sopuli.xyz
evenwicht @lemmy.sdf.org
6mo ago

Q: How to produce a PDF with deliberate blank pages that don’t count when a print shop prints them
Suppose you are printing a book or some compilation of several shorter documents. You would do a duplex print (printing on both sides) but you don’t generally want the backside of the last page of a chapter/section/episode to contain the first page of the next piece.
In LaTeX we would add a \cleardoublepage or \cleartooddpage before every section. The compiler then only adds a blank page on an as-needed basis. It works as expected and prints correctly. But it’s a waste of money because the print shop counts blank pages as any other page.
My hack is this:
undefined

\newcommand{\tinyblankifeven}{{\KOMAoptions{paper=a8}\recalctypearea\thispagestyle{empty}\cleartooddpage}}
That inserts an A8 formatted blank page whenever a blank is added. That then serves as a marker for this shell script:
undefined

make_batches_pdf() { local -r src=$1 local start=1 local batch=1 while read pg do fn_dest=${src%.pdf}_b$(printf '%0.2d' $batch).pdf batch=$((batch+1))
0
Paperless office; document/image processing @sopuli.xyz
ciferecaNinjo @fedia.io
6mo ago
solved

Preparing a PDF for a lawyer and other orgs to use in court. PDF bookmarks, evidence labels, etc. Using LaTeX.

fedia.io [solved] Preparing a PDF for a lawyer and other orgs to use in court. PDF bookmarks, evidence labels, etc. Using LaTeX. - Juridisch Advies NL - Fedia
I’ve been using LaTeX to prepare legal documents in PDF format with a tree of PDF bookmarks like this:...

0
Paperless office; document/image processing @sopuli.xyz
evenwicht @lemmy.sdf.org
7mo ago

Store the URL of a PDF you downloaded in the PDF’s metadata
Create ~/.ExifTool_config:
undefined

%Image::ExifTool::UserDefined = ( 'Image::ExifTool::XMP::xmp' => { # SRCURL tag (simple string, no checking, we specify the name explicitly so it stays all uppercase) SRCURL => { Name => 'SRCURL' }, PUBURL => { Name => 'PUBURL' }, # Text tag (can be specified in alternative languages) Text => { }, }, ); 1;
Then after fetching a PDF, run this:
undefined

$ exiftool -config ~/.ExifTool_config -xmp-xmp:srcurl="$URL" "$PDF"
To see the URL, simply run:
undefined

$ exiftool "$PDF"
It is a bit ugly that we need a complicated config file just to add an attribute to the metadata. But at least it works. I also have a PUBURL field to store URLs of PDFs I have published so I can keep track of where they were published.
Note that “srcurl” is an arbitrray identifier of my choosing, so use whatever tag suits you. I could not find a standard fieldname for this.
0
Paperless office; document/image processing @sopuli.xyz
evenwicht @lemmy.sdf.org
8mo ago

(PDF neutering) Not all PDFs are documents; some are apps! Insurance company sent me a form to sign as a PDF with ~~JavaScript~~ Java. Is it a tracker?
They emailed me a PDF. It opened fine with evince and looked like a simple doc at first. Then I clicked on a field in the form. Strangely, instead of simply populating the field with my text, a PDF note window popped up so my text entry went into a PDF note, which many viewers present as a sticky note icon.
If I were to fax this PDF, the PDF comments would just get lost. So to fill out the form I fed it to LaTeX and used the overpic pkg to write text wherever I choose. LaTeX rejected the file.. could not handle this PDF. Then I used the file command to see what I am dealing with:
undefined

$ file signature_page.pdf signature_page.pdf: Java serialization data, version 5
WTF is that? I know PDF supports JavaScript (shitty indeed). Is that what this is? “Java” is not JavaScript, so I’m baffled. Why is java in a PDF? (edit: explainer on java serialization, and [some analysis](https://superuser.com/questions/1212097/can-a-val
0
Paperless office; document/image processing @sopuli.xyz
plantteacher @mander.xyz
9mo ago

How to obtain the density (DPI / PPI) of a PGM file -- anyone know? ImageMagick does not cut it.
Running this gives the geometry but not the density:
undefined

$ identify -verbose myfile.pgm | grep -iE 'geometry|pixel|dens|size|dimen|inch|unit'
There is also a “Pixels per second” attribute which means nothing to me. No density and not even a canvas/page dimension (which would make it possible to compute the density). The “Units” attribute on my source images are “undefined”.
Suggestions?
0
Paperless office; document/image processing @sopuli.xyz
ulo @discuss.tchncs.de
1y ago

Safe enough for public webserver?

I just discovered this software and like it very much.
Would you consider it safe enough to use it with my personal documents on a public webserver?

0
Paperless office; document/image processing @sopuli.xyz
freedomPusher @sopuli.xyz
1y ago

PDF renders radically different between Adobe Acrobat® vs. evince & okular (GhostScript-based)

mirror.ctan.org /macros/latex/contrib/pdfcomment/doc/example.pdf

The linked doc is a PDF which looks very different in Adobe Acrobat than it does in evince and okular, which I believe are both based on the same GhostScript library.
So the question is, is there an alternative free PDF viewer that does not rely on the GhostScript library for rendering?
#AskFedi

0
Paperless office; document/image processing @sopuli.xyz
freedomPusher @sopuli.xyz
1y ago

solved

TIFF → DjVu conversion produces bigger file from bilevel doc than color

I would like to get to the bottom of what I am doing wrong that leads to black and white documents having a bigger filesize than color.
My process for a color TIFF is like this:
① tiff2pdf ② ocrmypdf ③ pdf2djvu
Resulting color DjVu file is ~56k. When pdfimages -all runs on the intermediate PDF file, it shows CCITT (fax) is inside.
My process for a black and white TIFF is the same:
① tiff2pdf ② ocrmypdf ③ pdf2djvu
Resulting black and white DjVu file is ~145k (almost 3× the color size). When pdfimages -all runs on the intermediate PDF file, it shows a PNG file is inside. If I replace step ① with ImageMagick’s convert, the first PDF is 10mb, but in the end the resulting djvu file is still ~145k. And PNG is still inside the intermediate PDF.
I can get the bitonal (bilevel) image smaller by using cjb2 -clean, which goes straight from TIFF to DjVu, but then I can’t OCR it due to the lack of PDF intermediate version. And the size is still bigger than t

0

0 active users