diff --git a/README.md b/README.md index d0871c5..d56ada2 100644 --- a/README.md +++ b/README.md @@ -36,37 +36,49 @@ rga --rga-list-adapters Adapters: -- ffmpeg +Adapters: + +- **ffmpeg** Uses ffmpeg to extract video metadata/chapters and subtitles Extensions: .mkv, .mp4, .avi -- pandoc +* **pandoc** Uses pandoc to convert binary/unreadable text documents to plain markdown-like text Extensions: .epub, .odt, .docx, .fb2, .ipynb -- poppler +- **poppler** Uses pdftotext (from poppler-utils) to extract plain text from PDF files Extensions: .pdf -- zip +* **zip** Reads a zip file as a stream and recurses down into its contents Extensions: .zip -- tar + Mime Types: application/zip + +* **decompress** + + Reads compressed file as a stream and runs a different extractor on the contents. + + Extensions: .tgz, .tbz, .tbz2, .gz, .bz2, .xz, .zst + + Mime Types: application/gzip, application/x-bzip, application/x-xz, application/zstd + +* **tar** Reads a tar file as a stream and recurses down into its contents - Extensions: .tar, .tar.gz, .tar.bz2, .tar.xz, .tar.zst + Extensions: .tar -- sqlite +- **sqlite** Uses sqlite bindings to convert sqlite databases into a simple plain text format @@ -74,14 +86,16 @@ Adapters: Mime Types: application/x-sqlite3 -The following adapters are disabled by default, and can be enabled using `--rga-adapters=+pdfpages,tesseract`: +The following adapters are disabled by default, and can be enabled using '--rga-adapters=+pdfpages,tesseract': + +- **pdfpages** -- pdfpages Converts a pdf to it's individual pages as png files. Only useful in combination with tesseract Extensions: .pdf -- tesseract +* **tesseract** + Uses tesseract to run OCR on images to make them searchable. May need -j1 to prevent overloading the system. Make sure you have tesseract installed. Extensions: .jpg, .png