ripgrep-all/README.md

274 lines
8.5 KiB
Markdown
Raw Permalink Normal View History

2019-06-15 09:37:41 +00:00
# rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.
2019-06-12 19:29:56 +00:00
2019-06-15 10:06:24 +00:00
rga is a line-oriented search tool that allows you to look for a regex in a multitude of file types. rga wraps the awesome [ripgrep] and enables it to search in pdf, docx, sqlite, jpg, movie subtitles (mkv, mp4), etc.
2019-06-12 20:32:20 +00:00
2020-06-09 20:54:42 +00:00
[ripgrep]: https://github.com/BurntSushi/ripgrep
2019-06-15 10:06:24 +00:00
[![github repo](https://img.shields.io/badge/repo-github.com%2Fphiresky%2Fripgrep--all-informational.svg)](https://github.com/phiresky/ripgrep-all)
[![Crates.io](https://img.shields.io/crates/v/ripgrep-all.svg)](https://crates.io/crates/ripgrep-all)
[![fearless concurrency](https://img.shields.io/badge/concurrency-fearless-success.svg)](https://www.reddit.com/r/rustjerk/top/?sort=top&t=all)
2019-06-12 19:55:42 +00:00
2019-06-15 09:37:41 +00:00
For more detail, see this introductory blogpost: https://phiresky.github.io/blog/2019/rga--ripgrep-for-zip-targz-docx-odt-epub-jpg/
2019-06-15 10:04:20 +00:00
rga will recursively descend into archives and match text in every file type it knows.
Here is an [example directory](https://github.com/phiresky/ripgrep-all/tree/master/exampledir/demo) with different file types:
```
demo/
├── greeting.mkv
├── hello.odt
├── hello.sqlite3
└── somearchive.zip
├── dir
│ ├── greeting.docx
│ └── inner.tar.gz
│ └── greeting.pdf
└── greeting.epub
```
![rga output](doc/demodir.png)
2020-06-06 13:24:01 +00:00
## Integration with fzf
![rga-fzf](doc/rga-fzf.gif)
2023-07-04 21:30:22 +00:00
See [the wiki](https://github.com/phiresky/ripgrep-all/wiki/fzf-Integration) for instructions of integrating rga with fzf.
2022-06-09 21:19:17 +00:00
## INSTALLATION
2020-05-19 09:10:11 +00:00
Linux x64, macOS and Windows binaries are available [in GitHub Releases][latestrelease].
[latestrelease]: https://github.com/phiresky/ripgrep-all/releases/latest
### Linux
2020-08-25 12:35:44 +00:00
#### Arch Linux
`pacman -S ripgrep-all`.
2020-08-25 12:35:44 +00:00
#### Nix
2023-05-26 14:37:28 +00:00
2020-08-25 12:35:44 +00:00
`nix-env -iA nixpkgs.ripgrep-all`
#### Debian-based
2023-05-26 14:37:28 +00:00
2020-08-25 12:35:44 +00:00
download the [rga binary][latestrelease] and get the dependencies like this:
`apt install ripgrep pandoc poppler-utils ffmpeg`
If ripgrep is not included in your package sources, get it from [here](https://github.com/BurntSushi/ripgrep/releases).
rga will search for all binaries it calls in \$PATH and the directory itself is in.
### Windows
2020-05-19 09:10:11 +00:00
Install ripgrep-all via [Chocolatey](https://chocolatey.org/packages/ripgrep-all):
```
choco install ripgrep-all
```
2020-12-07 12:28:09 +00:00
Note that installing via chocolatey or scoop is the only supported download method. If you download the binary from releases manually, you will not get the dependencies (for example pdftotext from poppler).
2019-06-19 14:06:31 +00:00
If you get an error like `VCRUNTIME140.DLL could not be found`, you need to install [vc_redist.x64.exe](https://support.microsoft.com/en-us/help/2977003/the-latest-supported-visual-c-downloads).
### Homebrew/Linuxbrew
`rga` can be installed with [Homebrew](https://formulae.brew.sh/formula/ripgrep-all#default):
`brew install rga`
2020-05-19 09:10:11 +00:00
To install the dependencies that are each not strictly necessary but very useful:
2023-07-04 21:30:22 +00:00
`brew install pandoc poppler ffmpeg`
### Compile from source
rga should compile with stable Rust (v1.36.0+, check with `rustc --version`). To build it, run the following (or the equivalent in your OS):
```
~$ apt install build-essential pandoc poppler-utils ffmpeg ripgrep cargo
2022-03-23 20:40:26 +00:00
~$ cargo install --locked ripgrep_all
~$ rga --version # this should work now
```
2019-06-15 19:22:07 +00:00
## Available Adapters
2023-07-04 21:38:17 +00:00
rga works with _adapters_ that adapt various file formats. It comes with a few adapters integrated:
2019-06-15 19:22:07 +00:00
```
rga --rga-list-adapters
```
2023-07-04 21:38:17 +00:00
You can also add **custom adapters**. See [the wiki](https://github.com/phiresky/ripgrep-all/wiki) for more information.
<!-- this part generated by update-readme.sh -->
2019-06-15 19:22:07 +00:00
2019-06-16 15:17:15 +00:00
Adapters:
2023-05-26 14:37:28 +00:00
- **pandoc**
Uses pandoc to convert binary/unreadable text documents to plain markdown-like text
Runs: pandoc --from= --to=plain --wrap=none --markdown-headings=atx
Extensions: .epub, .odt, .docx, .fb2, .ipynb
2019-06-15 19:22:07 +00:00
2023-05-26 14:37:28 +00:00
- **poppler**
Uses pdftotext (from poppler-utils) to extract plain text from PDF files
Runs: pdftotext - -
Extensions: .pdf
Mime Types: application/pdf
2019-06-15 19:22:07 +00:00
2023-05-26 14:37:28 +00:00
- **postprocpagebreaks**
Adds the page number to each line for an input file that specifies page breaks as ascii page break character.
Mainly to be used internally by the poppler adapter.
Extensions: .asciipagebreaks
2019-06-16 15:17:15 +00:00
2023-05-26 14:37:28 +00:00
- **ffmpeg**
Uses ffmpeg to extract video metadata/chapters, subtitles, lyrics, and other metadata
2023-07-04 21:38:17 +00:00
Extensions: .mkv, .mp4, .avi, .mp3, .ogg, .flac, .webm
2019-06-16 15:17:15 +00:00
2023-05-26 14:37:28 +00:00
- **zip**
Reads a zip file as a stream and recurses down into its contents
Extensions: .zip, .jar
Mime Types: application/zip
2019-06-15 19:22:07 +00:00
2023-05-26 14:37:28 +00:00
- **decompress**
Reads compressed file as a stream and runs a different extractor on the contents.
Extensions: .tgz, .tbz, .tbz2, .gz, .bz2, .xz, .zst
Mime Types: application/gzip, application/x-bzip, application/x-xz, application/zstd
2019-06-15 19:22:07 +00:00
2023-05-26 14:37:28 +00:00
- **tar**
Reads a tar file as a stream and recurses down into its contents
Extensions: .tar
2019-06-16 15:17:15 +00:00
2023-05-26 14:37:28 +00:00
- **sqlite**
Uses sqlite bindings to convert sqlite databases into a simple plain text format
Extensions: .db, .db3, .sqlite, .sqlite3
Mime Types: application/x-sqlite3
2019-06-15 19:22:07 +00:00
2023-05-26 14:37:28 +00:00
The following adapters are disabled by default, and can be enabled using '--rga-adapters=+foo,bar':
2019-06-15 19:22:07 +00:00
2020-06-06 13:01:53 +00:00
## USAGE:
2020-06-06 12:53:09 +00:00
2020-06-06 13:01:53 +00:00
> rga \[RGA OPTIONS\] \[RG OPTIONS\] PATTERN \[PATH \...\]
2019-06-15 10:04:20 +00:00
2020-06-06 13:01:53 +00:00
## FLAGS:
2019-06-15 10:04:20 +00:00
**\--rga-accurate**
> Use more accurate but slower matching by mime type
>
> By default, rga will match files using file extensions. Some programs,
> such as sqlite3, don\'t care about the file extension at all, so users
> sometimes use any or no extension at all. With this flag, rga will try
> to detect the mime type of input files using the magic bytes (similar
> to the \`file\` utility), and use that to choose the adapter.
> Detection is only done on the first 8KiB of the file, since we can\'t
> always seek on the input (in archives).
2023-05-26 14:37:28 +00:00
**\--rga-no-cache**
> Disable caching of results
>
> By default, rga caches the extracted text, if it is small enough, to a
> database in \${XDG*CACHE_DIR-\~/.cache}/ripgrep-all on Linux,
> *\~/Library/Caches/ripgrep-all\_ on macOS, or
> C:\\Users\\username\\AppData\\Local\\ripgrep-all on Windows. This way,
> repeated searches on the same set of files will be much faster. If you
> pass this flag, all caching will be disabled.
2019-06-15 10:04:20 +00:00
**-h**, **\--help**
> Prints help information
**\--rga-list-adapters**
> List all known adapters
2023-05-26 14:37:28 +00:00
**\--rga-print-config-schema**
2019-06-15 10:04:20 +00:00
2023-05-26 14:37:28 +00:00
> Print the JSON Schema of the configuration file
2019-06-15 10:04:20 +00:00
**\--rg-help**
> Show help for ripgrep itself
**\--rg-version**
> Show version of ripgrep itself
**-V**, **\--version**
> Prints version information
## OPTIONS:
**\--rga-adapters=**\<adapters\>\...
> Change which adapters to use and in which priority order (descending)
>
> \"foo,bar\" means use only adapters foo and bar. \"-bar,baz\" means
> use all default adapters except for bar and baz. \"+bar,baz\" means
> use all default adapters and also bar and baz.
2023-05-26 14:37:28 +00:00
**\--rga-cache-compression-level=**\<compression-level\>
2019-06-15 10:04:20 +00:00
2020-06-06 13:01:53 +00:00
> ZSTD compression level to apply to adapter outputs before storing in
> cache db
>
> Ranges from 1 - 22 \[default: 12\]
2019-06-15 10:04:20 +00:00
2023-05-26 14:37:28 +00:00
**\--rga-config-file=**\<config-file-path\>
**\--rga-max-archive-recursion=**\<max-archive-recursion\>
> Maximum nestedness of archives to recurse into \[default: 4\]
**\--rga-cache-max-blob-len=**\<max-blob-len\>
2019-06-15 10:04:20 +00:00
> Max compressed size to cache
>
> Longest byte length (after compression) to store in cache. Longer
2023-05-26 14:37:28 +00:00
> adapter outputs will not be cached and recomputed every time.
>
> Allowed suffixes on command line: k M G \[default: 2000000\]
2019-06-15 10:04:20 +00:00
2023-05-26 14:37:28 +00:00
**\--rga-cache-path=**\<path\>
2019-06-15 10:04:20 +00:00
2023-05-26 14:37:28 +00:00
> Path to store cache db \[default: /home/phire/.cache/ripgrep-all\]
2019-06-15 10:04:20 +00:00
**-h** shows a concise overview, **\--help** shows more detail and
advanced options.
All other options not shown here are passed directly to rg, especially
\[PATTERN\] and \[PATH \...\]
2020-06-06 13:01:53 +00:00
<!-- end of part generated by update-readme.sh -->
2020-06-06 12:53:09 +00:00
2019-06-15 19:44:11 +00:00
## Development
To enable debug logging:
```bash
export RUST_LOG=debug
export RUST_BACKTRACE=1
```
Also remember to disable caching with `--rga-no-cache` or clear the cache
(`~/Library/Caches/rga` on macOS, `~/.cache/rga` on other Unixes,
or `C:\Users\username\AppData\Local\rga` on Windows)
to debug the adapters.
### Nix and Direnv
You can use the provided [`flake.nix`](./flake.nix) to setup all build- and
run-time dependencies:
2023-05-26 14:37:28 +00:00
1. Enable [Flakes](https://nixos.wiki/wiki/Flakes) in your Nix configuration.
1. Add [`direnv`](https://direnv.net/) to your profile:
`nix profile install nixpkgs#direnv`
1. `cd` into the directory where you have cloned this directory.
1. Allow use of [`.envrc`](./.envrc): `direnv allow`
1. After the dependencies have been installed, your shell will now have all of
the necessary development dependencies.