ripgrep-all/README.md

47 lines
1.7 KiB
Markdown
Raw Normal View History

2019-06-12 19:29:56 +00:00
# rga - ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc
2019-06-12 20:32:20 +00:00
rga is a tool to recursively search for text in many different types of files. It is based on the awesome [ripgrep](https://github.com/BurntSushi/ripgrep).
2019-06-12 21:06:50 +00:00
[![Linux build status](https://api.travis-ci.org/phiresky/ripgrep_all.svg)](https://travis-ci.org/phiresky/ripgrep_all)
2019-06-12 20:06:21 +00:00
[![Crates.io](https://img.shields.io/crates/v/ripgrep_all.svg)](https://crates.io/crates/ripgrep_all)
2019-06-12 19:55:42 +00:00
2019-06-13 13:18:14 +00:00
# Future Work
2019-06-06 21:19:59 +00:00
2019-06-13 13:18:14 +00:00
- photograph adapter (based on object classification / detection (yolo?)) for fun, based on something [like this](https://github.com/aimagelab/show-control-and-tell). Tried, but very hard to integrate (especially state of the art approaches).
2019-06-11 18:35:20 +00:00
- 7z adapter (couldn't find a nice to use rust library)
2019-06-06 21:19:59 +00:00
2019-06-06 15:59:15 +00:00
# considerations
2019-06-11 18:35:20 +00:00
- matching on mime (magic bytes) instead of filename
- allow per-adapter configuration options
2019-06-07 19:46:03 +00:00
2019-06-12 19:37:15 +00:00
# Setup
2019-06-12 19:55:42 +00:00
rga should compile with stable Rust. To install it, simply run
```bash
2019-06-12 20:11:20 +00:00
apt install build-essential pandoc poppler-utils
2019-06-12 19:55:42 +00:00
cargo install ripgrep_all
rga --help
```
2019-06-12 19:37:15 +00:00
Some rga adapters run external binaries
2019-06-07 19:46:03 +00:00
# Development
To enable debug logging:
2019-06-07 21:04:18 +00:00
```bash
2019-06-12 15:23:30 +00:00
export RUST_LOG=debug
2019-06-07 19:46:03 +00:00
export RUST_BACKTRACE=1
2019-06-07 21:04:18 +00:00
```
2019-06-11 18:35:20 +00:00
Also rember to disable caching with `--rga-no-cache` or clear the cache in `~/.cache/rga` to debug the adapters.
2019-06-12 20:32:20 +00:00
# Similar tools
- [pdfgrep](https://pdfgrep.org/)
- [this gist](https://gist.github.com/phiresky/5025490526ba70663ab3b8af6c40a8db) has my proof of concept version of a caching extractor to use ripgrep as a replacement for pdfgrep.
- [this gist](https://gist.github.com/ColonolBuendia/314826e37ec35c616d70506c38dc65aa) is a more extensive preprocessing script by [@ColonolBuendia](https://github.com/ColonolBuendia)