
Hello!
Today I’m happy to show you a simple way to configure the ability to add files in Office formats to Paperless-ngx, such as .doc, .xlsx or .odt. Discover how to quickly and efficiently extend the functionality of your Paperless, allowing you to easily upload and manage documents in popular office formats.
Introduction
What is Paperless-ngx?
Paperless-ngx is a modern open source solution for paperless document management. It is a fork of the Paperless project, which is designed to scan, tag, search and manage digital copies of paper documents to minimize the need to store physical copies. Paperless-ngx offers a variety of improvements over the original design, including better user interface support, more advanced search options, automatic tagging of documents based on their content, and OCR (optical character recognition) support in multiple languages, allowing for easier management and retrieval of documents in the database.
This project is particularly useful for individuals and companies seeking to reduce the amount of paper in their work and daily lives. It offers a simple and efficient way to organize digital documents.
Genesis of the issue
To start with, by default Paperless-ngx provides functionality for uploading .pdf files. However, to extend the system’s capabilities, it is worth configuring Gotenberg support. Gotenberg is a tool that allows you to convert documents in various formats to PDF using HTTP requests. In addition, the configuration of Apache Tika support will allow users to analyze the contents of files in office formats such as .doc, .xlsx, or .odt, making it possible to post-process them, such as converting them to PDF using Gotenberg. It’s worth noting that despite the conversion, users will still be able to retrieve documents from the archive in their original formats, retaining full flexibility in managing documentation resources.
The following tutorial was developed using version: Paperless-ngx v2.8.1.
Step 1 – Installing Paperless-ngx on a Synology server
If you have not yet installed Paperless-ngx on your Synology device, I strongly encourage you to read this article. You will learn from it how to set up a paperless document management system. This is a great opportunity to start taking advantage of Paperless-ngx’s advanced capabilities for easy organization and secure document storage.
Step 2 – Edit parameters in Docker Compose
In step 2, we’ll look at editing the parameters in the Docker Compose file to enable advanced options in Paperless-ngx. This is a key part of the setup, allowing the process to be tailored to specific needs, including processing documents in Office formats.
- Log in to your account and go to the administrative interface of the Portainer.
- If you are using the Authentik single sign-on system, you can make the login process easier by integrating Portainer with Authentik. For integration instructions, see the dedicated guide.
- Select your environment (eng: Environments) in which you will edit the configuration of the created Docker Compose. Then go to Stacks.
- Select the created Stack named paperless-ngx (or named according to you, corresponding to Paperless).
Gotenberg and Tika
- Add the following Docker Compose code to your existing stack to configure Gotenberg and Tika:
gotenberg:
image: gotenberg/gotenberg:7.9.2
restart: always
container_name: paperless_gotenberg
networks:
- paperless_network
command:
- "gotenberg"
- "--log-level=debug"
- "--chromium-disable-javascript=true"
- "--chromium-disable-routes=true"
- "--chromium-allow-list=file:///tmp/.*"
- "--api-timeout=600s"
tika:
image: ghcr.io/paperless-ngx/tika
container_name: paperless_tika
networks:
- paperless_network
restart: always
For the Gotenberg portion of the application, I’m using the Docker image version 7.9.2 because I’ve been noticing a lot of reported bugs related to converting documents to PDF that have appeared in the GitHub repository. This version in operation with Paperless proves to be the most stable. In the command block, we configure Gotenberg by specifying the login level, disabling JavaScript in Chromium, disabling some routes, setting a list of allowed URLs for Chromium, and setting the time limit for API response to 600 seconds. This configuration will ensure optimal performance and stability of Gotenberg in your environment.
Paperless-ngx
- Add a parameter to Environment at the Paperless-ngx Web server as described below:
PAPERLESS_TIKA_ENABLED: 1
PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
PAPERLESS_TIKA_ENDPOINT: http://tika:9998
- Click Deploy the stack, then wait until Portainer processes the new content and creates the containers again.
- Done! 🚀
Step 3 – Upload the file to Paperless
To test the service, select the file you want to upload to Paperless.
- Go to the Paperless panel and use the file upload option.
- Once the file has been successfully uploaded, monitor the system’s interaction with the file to confirm that the service is working correctly.
- The logs should show the process of reading the contents of the document. Example below:
[2024-05-07 18:42:03,800] [DEBUG] [paperless.consumer] Detected mime type: application/vnd.openxmlformats-officedocument.wordprocessingml.document
[2024-05-07 18:42:04,142] [DEBUG] [paperless.consumer] Parser: TikaDocumentParser
[2024-05-07 18:42:04,147] [DEBUG] [paperless.consumer] Parsing 2024-04-06 XXXX.docx...
[2024-05-07 18:42:04,148] [INFO] [paperless.parsing.tika] Sending /tmp/paperless/paperless-ngx97r_6y6x/2024-04-06 XXXX.docx to Tika server
[2024-05-07 18:42:06,377] [INFO] [paperless.parsing.tika] Converting /tmp/paperless/paperless-ngx97r_6y6x/2024-04-06 XXXX.docx to PDF as /tmp/paperless/paperless-e0xc5cx4/convert.pdf
[2024-05-07 18:42:11,666] [DEBUG] [paperless.consumer] Generating thumbnail for 2024-04-06 XXXX.docx...
[2024-05-07 18:42:11,673] [DEBUG] [paperless.parsing] Execute: convert -density 300 -scale 500x5000> -alpha remove -strip -auto-orient -define pdf:use-cropbox=true /tmp/paperless/paperless-e0xc5cx4/convert.pdf[0] /tmp/paperless/paperless-e0xc5cx4/convert.webp
[2024-05-07 18:42:13,766] [INFO] [paperless.parsing] convert exited 0
[2024-05-07 18:42:13,984] [DEBUG] [paperless.consumer] Saving record to database
[2024-05-07 18:42:13,985] [DEBUG] [paperless.consumer] Creation date from parse_date: 2024-04-06 18:29:00+00:00
[2024-05-07 18:42:14,120] [INFO] [paperless.handlers] Assigning document type Filip / Sprawy publiczne, to 2024-04-06 2024-04-06 XXXX.docx
[2024-05-07 18:42:14,177] [INFO] [paperless.handlers] Assigning storage path Sprawy publiczne: 2024 Warszawa, Zarząd Dróg Miejskich
[2024-05-07 18:42:14,592] [DEBUG] [paperless.consumer] Deleting file /tmp/paperless/paperless-ngx97r_6y6x/2024-04-06 XXXX.docx
[2024-05-07 18:42:14,605] [DEBUG] [paperless.parsing.tika] Deleting directory /tmp/paperless/paperless-e0xc5cx4
[2024-05-07 18:42:14,607] [INFO] [paperless.consumer] Document 2024-04-06 XXXX consumption finished
[2024-05-07 18:42:14,627] [INFO] [paperless.tasks] ConsumeTaskPlugin completed with: Success. New document id 4151 created
If you have additional questions about the setup, go ahead and leave a comment under this article or contact me directly. I will be happy to answer any concerns and help solve any problems. Your questions can help improve this guide for other users.
Additional sources and information
Configure Single Sign-On (SSO) between Authentik and Paperless-ngx using OpenID Connect to increase the convenience of logging in and security of access to your document management system. Learn the steps needed to integrate these two powerful tools and enjoy a smoother authentication process.
👉 Learn more about the process and make your login management easier.
Discover how to resolve digital signature error in PDF files.
👉 Go to the article that will show you how to deal with the message: “DigitalSignatureError”.
For further exploration and more information, I recommend checking out the links below. They are valuable sources that were used in the development of this guide.
- Paperless-ngx, Optional services: https://docs.paperless-ngx.com/configuration/#tika
- Github, Paperless-ngx setup with Gotenberg & Tika not working?: https://github.com/paperless-ngx/paperless-ngx/discussions/3017
- Gotenberg, Troubleshooting: https://gotenberg.dev/docs/troubleshooting
Read also
- Raspberry Pi 5: Installing and configuring an NVMe drive. Learn how to install and configure an NVMe drive with the Raspberry Pi 5. Practical step-by-step guide!
- Home Assistant: Installation and integration of the Eastron DSM120M meter. Installation and integration of Eastron energy meter with Home Assistant using ESPHome. A detailed step-by-step guide.
- Home Assistant: BME280 sensor installation via ESPHome. Create a simple weather station for your balcony with ESPHome and Home Assistant. With this tutorial, you will build a system with the BME280 sensor.
- TVHeadend: SAT>IP decoder installation and configuration. Telestar Digibit Twin: Discover the step-by-step configuration of a SAT>IP decoder with TVHeadend in Docker Compose in blog post.
- How to configure a Leox GPON ONT module on a Mikrotik router. Step-by-step guide on how to configure Leox LXT-010S-H GPON ONT module on Mikrotik RB5009 router, instead of ONT module from Orange.
- ADS-B: Installing and configuring the receiver on a Raspberry Pi. Discover the secrets of installing and configuring your own ADS-B antenna on a Raspberry Pi. Develop skills and track aircraft in real time.
- Traccar: A guide to installing on Synology in Docker. Step-by-step guide: Installing Traccar on Synology using Docker. Effective vehicle tracking on your own server.
Leave a Reply