ripgrep-all/README.md

# rga - ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc

rga is a tool to recursively search for text in many different types of files. It is based on the awesome [ripgrep](https://github.com/BurntSushi/ripgrep).

[![Linux build status](https://api.travis-ci.org/phiresky/ripgrep_all.svg)](https://travis-ci.org/phiresky/ripgrep_all)
[![Crates.io](https://img.shields.io/crates/v/ripgrep_all.svg)](https://crates.io/crates/ripgrep_all)

# todo

- jpg adapter (based on object classification / detection (yolo?)) for fun
- 7z adapter (couldn't find a nice to use rust library)

# considerations

- matching on mime (magic bytes) instead of filename
- allow per-adapter configuration options

# Setup

rga should compile with stable Rust. To install it, simply run

```bash
apt install build-essential pandoc poppler-utils
cargo install ripgrep_all

rga --help
```

Some rga adapters run external binaries

# Development

To enable debug logging:

```bash
export RUST_LOG=debug
export RUST_BACKTRACE=1
```

Also rember to disable caching with `--rga-no-cache` or clear the cache in `~/.cache/rga` to debug the adapters.

# Similar tools

- [pdfgrep](https://pdfgrep.org/)
- [this gist](https://gist.github.com/phiresky/5025490526ba70663ab3b8af6c40a8db) has my proof of concept version of a caching extractor to use ripgrep as a replacement for pdfgrep.
- [this gist](https://gist.github.com/ColonolBuendia/314826e37ec35c616d70506c38dc65aa) is a more extensive preprocessing script by [@ColonolBuendia](https://github.com/ColonolBuendia)
readme 2019-06-12 19:29:56 +00:00			`# rga - ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc`

fixes 2019-06-12 20:32:20 +00:00			`rga is a tool to recursively search for text in many different types of files. It is based on the awesome [ripgrep](https://github.com/BurntSushi/ripgrep).`

readme 2019-06-12 21:06:50 +00:00			`[![Linux build status](https://api.travis-ci.org/phiresky/ripgrep_all.svg)](https://travis-ci.org/phiresky/ripgrep_all)`
badges 2019-06-12 20:06:21 +00:00			`[![Crates.io](https://img.shields.io/crates/v/ripgrep_all.svg)](https://crates.io/crates/ripgrep_all)`
travis.yml 2019-06-12 19:55:42 +00:00
finally fix tar 2019-06-06 21:19:59 +00:00			`# todo`

rename crate 2019-06-11 18:35:20 +00:00			`- jpg adapter (based on object classification / detection (yolo?)) for fun`
			`- 7z adapter (couldn't find a nice to use rust library)`
finally fix tar 2019-06-06 21:19:59 +00:00
tar adapter (broken compression) 2019-06-06 15:59:15 +00:00			`# considerations`

rename crate 2019-06-11 18:35:20 +00:00			`- matching on mime (magic bytes) instead of filename`
			`- allow per-adapter configuration options`
better arg parsing and passing 2019-06-07 19:46:03 +00:00
readme 2019-06-12 19:37:15 +00:00			`# Setup`

travis.yml 2019-06-12 19:55:42 +00:00			`rga should compile with stable Rust. To install it, simply run`

			```bash
sqlite3 bundled 2019-06-12 20:11:20 +00:00			`apt install build-essential pandoc poppler-utils`
travis.yml 2019-06-12 19:55:42 +00:00			`cargo install ripgrep_all`

			`rga --help`
			```

readme 2019-06-12 19:37:15 +00:00			`Some rga adapters run external binaries`

better arg parsing and passing 2019-06-07 19:46:03 +00:00			`# Development`

			`To enable debug logging:`

more options, less constants 2019-06-07 21:04:18 +00:00			```bash
add tesseract adapter 2019-06-12 15:23:30 +00:00			`export RUST_LOG=debug`
better arg parsing and passing 2019-06-07 19:46:03 +00:00			`export RUST_BACKTRACE=1`
more options, less constants 2019-06-07 21:04:18 +00:00			```
rename crate 2019-06-11 18:35:20 +00:00
			Also rember to disable caching with `--rga-no-cache` or clear the cache in `~/.cache/rga` to debug the adapters.
fixes 2019-06-12 20:32:20 +00:00
			`# Similar tools`

			`- [pdfgrep](https://pdfgrep.org/)`
			`- [this gist](https://gist.github.com/phiresky/5025490526ba70663ab3b8af6c40a8db) has my proof of concept version of a caching extractor to use ripgrep as a replacement for pdfgrep.`
			`- [this gist](https://gist.github.com/ColonolBuendia/314826e37ec35c616d70506c38dc65aa) is a more extensive preprocessing script by [@ColonolBuendia](https://github.com/ColonolBuendia)`