2019-06-15 09:37:41 +00:00
# rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.
2019-06-12 19:29:56 +00:00
2019-06-15 10:06:24 +00:00
rga is a line-oriented search tool that allows you to look for a regex in a multitude of file types. rga wraps the awesome [ripgrep] and enables it to search in pdf, docx, sqlite, jpg, movie subtitles (mkv, mp4), etc.
2019-06-12 20:32:20 +00:00
2019-06-15 10:06:24 +00:00
[![github repo ](https://img.shields.io/badge/repo-github.com%2Fphiresky%2Fripgrep--all-informational.svg )](https://github.com/phiresky/ripgrep-all)
[![Crates.io ](https://img.shields.io/crates/v/ripgrep-all.svg )](https://crates.io/crates/ripgrep-all)
[![fearless concurrency ](https://img.shields.io/badge/concurrency-fearless-success.svg )](https://www.reddit.com/r/rustjerk/top/?sort=top& t=all)
2019-06-12 19:55:42 +00:00
2019-06-15 09:37:41 +00:00
For more detail, see this introductory blogpost: https://phiresky.github.io/blog/2019/rga--ripgrep-for-zip-targz-docx-odt-epub-jpg/
2019-06-15 10:04:20 +00:00
rga will recursively descend into archives and match text in every file type it knows.
Here is an [example directory ](https://github.com/phiresky/ripgrep-all/tree/master/exampledir/demo ) with different file types:
```
demo/
├── greeting.mkv
├── hello.odt
├── hello.sqlite3
└── somearchive.zip
├── dir
│ ├── greeting.docx
│ └── inner.tar.gz
│ └── greeting.pdf
└── greeting.epub
```
![rga output ](doc/demodir.png )
2020-06-06 13:24:01 +00:00
## Integration with fzf
![rga-fzf ](doc/rga-fzf.gif )
You can use rga interactively. Add the following to your ~/.{bash,zsh}rc:
```bash
rga-fzf() {
RG_PREFIX="rga --files-with-matches --rga-cache-max-blob-len=10M $RGA_ARGS"
local file
file="$(
FZF_DEFAULT_COMMAND="$RG_PREFIX '$1'" \
fzf --sort --preview="rga --pretty --context 5 {q} {}" \
--phony -q "$1" \
--bind "change:reload:$RG_PREFIX {q}" \
2020-06-06 13:29:50 +00:00
--preview-window="70%:wrap"
2020-06-06 13:24:01 +00:00
)" & &
echo "opening $file" & &
xdg-open "$file"
}
```
2019-06-18 20:54:31 +00:00
## INSTALLATION
2020-05-19 09:10:11 +00:00
Linux x64, macOS and Windows binaries are available [in GitHub Releases][latestrelease].
2019-06-18 20:54:31 +00:00
[latestrelease]: https://github.com/phiresky/ripgrep-all/releases/latest
### Linux
On Arch Linux, you can simply install from AUR: `yay -S ripgrep-all` .
On Debian-based distributions you can download the [rga binary][latestrelease] and get the dependencies like this:
`apt install ripgrep pandoc poppler-utils ffmpeg cargo`
If ripgrep is not included in your package sources, get it from [here ](https://github.com/BurntSushi/ripgrep/releases ).
rga will search for all binaries it calls in \$PATH and the directory itself is in.
### Windows
2020-05-19 09:10:11 +00:00
Install ripgrep-all via [Chocolatey ](https://chocolatey.org/packages/ripgrep-all ):
```
choco install ripgrep-all
```
2019-06-18 20:54:31 +00:00
2019-06-19 14:06:31 +00:00
If you get an error like `VCRUNTIME140.DLL could not be found` , you need to install [vc_redist.x64.exe ](https://support.microsoft.com/en-us/help/2977003/the-latest-supported-visual-c-downloads ).
2019-09-24 06:23:03 +00:00
### Homebrew/Linuxbrew
2019-06-18 20:54:31 +00:00
2019-10-03 17:35:46 +00:00
`rga` can be installed with [Homebrew ](https://formulae.brew.sh/formula/ripgrep-all#default ):
2019-06-18 20:54:31 +00:00
2019-09-24 06:23:03 +00:00
`brew install rga`
2020-05-19 09:10:11 +00:00
To install the dependencies that are each not strictly necessary but very useful:
2019-09-24 06:23:03 +00:00
`brew install pandoc poppler tesseract ffmpeg`
2019-06-18 20:54:31 +00:00
### Compile from source
2019-07-29 11:09:34 +00:00
rga should compile with stable Rust (v1.36.0+, check with `rustc --version` ). To build it, run the following (or the equivalent in your OS):
2019-06-18 20:54:31 +00:00
```
~$ apt install build-essential pandoc poppler-utils ffmpeg ripgrep cargo
~$ cargo install ripgrep_all
~$ rga --version # this should work now
```
2019-06-15 19:22:07 +00:00
## Available Adapters
```
rga --rga-list-adapters
```
2020-06-06 13:07:59 +00:00
<!-- this part generated by update - readme.sh -->
2019-06-15 19:22:07 +00:00
2019-06-16 15:17:15 +00:00
Adapters:
- **ffmpeg**
2020-06-06 13:09:22 +00:00
Uses ffmpeg to extract video metadata/chapters and subtitles
2020-06-06 13:24:01 +00:00
Extensions: .mkv, .mp4, .avi
2019-06-15 19:22:07 +00:00
2019-06-16 15:17:15 +00:00
* **pandoc**
2020-06-06 13:09:22 +00:00
Uses pandoc to convert binary/unreadable text documents to plain markdown-like text
2020-06-06 13:24:01 +00:00
Extensions: .epub, .odt, .docx, .fb2, .ipynb
2019-06-15 19:22:07 +00:00
2019-06-16 15:17:15 +00:00
- **poppler**
2020-06-06 13:09:22 +00:00
Uses pdftotext (from poppler-utils) to extract plain text from PDF files
Extensions: .pdf
Mime Types: application/pdf
2019-06-15 19:22:07 +00:00
2020-06-06 13:07:59 +00:00
- **zip**
2020-06-06 13:09:22 +00:00
Reads a zip file as a stream and recurses down into its contents
Extensions: .zip
Mime Types: application/zip
2019-06-16 15:17:15 +00:00
2020-06-06 13:07:59 +00:00
- **decompress**
2020-06-06 13:09:22 +00:00
Reads compressed file as a stream and runs a different extractor on the contents.
Extensions: .tgz, .tbz, .tbz2, .gz, .bz2, .xz, .zst
Mime Types: application/gzip, application/x-bzip, application/x-xz, application/zstd
2019-06-16 15:17:15 +00:00
2020-06-06 13:07:59 +00:00
- **tar**
2020-06-06 13:09:22 +00:00
Reads a tar file as a stream and recurses down into its contents
2020-06-06 13:24:01 +00:00
Extensions: .tar
2019-06-15 19:22:07 +00:00
2020-06-06 13:07:59 +00:00
* **sqlite**
2020-06-06 13:09:22 +00:00
Uses sqlite bindings to convert sqlite databases into a simple plain text format
Extensions: .db, .db3, .sqlite, .sqlite3
Mime Types: application/x-sqlite3
2019-06-15 19:22:07 +00:00
2019-06-16 15:17:15 +00:00
The following adapters are disabled by default, and can be enabled using '--rga-adapters=+pdfpages,tesseract':
- **pdfpages**
2020-06-06 13:09:22 +00:00
Converts a pdf to its individual pages as png files. Only useful in combination with tesseract
Extensions: .pdf
Mime Types: application/pdf
2019-06-15 19:22:07 +00:00
2020-06-06 13:07:59 +00:00
- **tesseract**
2020-06-06 13:09:22 +00:00
Uses tesseract to run OCR on images to make them searchable. May need -j1 to prevent overloading the system. Make sure you have tesseract installed.
2020-06-06 13:24:01 +00:00
Extensions: .jpg, .png
2019-06-15 19:22:07 +00:00
2020-06-06 13:01:53 +00:00
## USAGE:
2020-06-06 12:53:09 +00:00
2020-06-06 13:01:53 +00:00
> rga \[RGA OPTIONS\] \[RG OPTIONS\] PATTERN \[PATH \...\]
2019-06-15 10:04:20 +00:00
2020-06-06 13:01:53 +00:00
## FLAGS:
2019-06-15 10:04:20 +00:00
**\--rga-accurate**
> Use more accurate but slower matching by mime type
>
> By default, rga will match files using file extensions. Some programs,
> such as sqlite3, don\'t care about the file extension at all, so users
> sometimes use any or no extension at all. With this flag, rga will try
> to detect the mime type of input files using the magic bytes (similar
> to the \`file\` utility), and use that to choose the adapter.
> Detection is only done on the first 8KiB of the file, since we can\'t
> always seek on the input (in archives).
**-h**, ** \--help**
> Prints help information
**\--rga-list-adapters**
> List all known adapters
**\--rga-no-cache**
> Disable caching of results
>
2020-06-06 13:01:53 +00:00
> By default, rga caches the extracted text, if it is small enough, to a
2020-06-06 13:07:59 +00:00
> database in \~/.cache/rga on Linux, _\~/Library/Caches/rga_ on macOS,
2020-06-06 13:01:53 +00:00
> or C:\\Users\\username\\AppData\\Local\\rga on Windows. This way,
> repeated searches on the same set of files will be much faster. If you
> pass this flag, all caching will be disabled.
2019-06-15 10:04:20 +00:00
**\--rg-help**
> Show help for ripgrep itself
**\--rg-version**
> Show version of ripgrep itself
**-V**, ** \--version**
> Prints version information
## OPTIONS:
**\--rga-adapters=**\<adapters\>\...
> Change which adapters to use and in which priority order (descending)
>
> \"foo,bar\" means use only adapters foo and bar. \"-bar,baz\" means
> use all default adapters except for bar and baz. \"+bar,baz\" means
> use all default adapters and also bar and baz.
**\--rga-cache-compression-level=**\<cache-compression-level\>
2020-06-06 13:01:53 +00:00
> ZSTD compression level to apply to adapter outputs before storing in
> cache db
>
> Ranges from 1 - 22 \[default: 12\]
2019-06-15 10:04:20 +00:00
2020-06-06 13:01:53 +00:00
**\--rga-cache-max-blob-len=**\<cache-max-blob-len\>
2019-06-15 10:04:20 +00:00
> Max compressed size to cache
>
> Longest byte length (after compression) to store in cache. Longer
2020-06-06 13:01:53 +00:00
> adapter outputs will not be cached and recomputed every time. Allowed
> suffixes: k M G \[default: 2000000\]
2019-06-15 10:04:20 +00:00
**\--rga-max-archive-recursion=**\<max-archive-recursion\>
> Maximum nestedness of archives to recurse into \[default: 4\]
**-h** shows a concise overview, ** \--help** shows more detail and
advanced options.
All other options not shown here are passed directly to rg, especially
\[PATTERN\] and \[PATH \...\]
2020-06-06 13:07:59 +00:00
2020-06-06 13:01:53 +00:00
<!-- end of part generated by update - readme.sh -->
2020-06-06 12:53:09 +00:00
2019-06-15 19:44:11 +00:00
## Development
To enable debug logging:
```bash
export RUST_LOG=debug
export RUST_BACKTRACE=1
```
2020-01-09 22:06:58 +00:00
Also remember to disable caching with `--rga-no-cache` or clear the cache
(`~/Library/Caches/rga` on macOS, `~/.cache/rga` on other Unixes,
or `C:\Users\username\AppData\Local\rga` on Windows)
to debug the adapters.