Add detected file language to code search (#10256)

Move langauge detection to separate module to be more reusable

Add option to disable vendored file exclusion from file search

Allways show all language stats for search
This commit is contained in:
Lauris BH 2020-02-20 21:53:55 +02:00 committed by GitHub
parent efbd7ca39b
commit 3c45cf8494
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
20 changed files with 346 additions and 63 deletions

View file

@ -242,6 +242,7 @@ relation to port exhaustion.
- `REPO_INDEXER_PATH`: **indexers/repos.bleve**: Index file used for code search.
- `REPO_INDEXER_INCLUDE`: **empty**: A comma separated list of glob patterns (see https://github.com/gobwas/glob) to **include** in the index. Use `**.txt` to match any files with .txt extension. An empty list means include all files.
- `REPO_INDEXER_EXCLUDE`: **empty**: A comma separated list of glob patterns (see https://github.com/gobwas/glob) to **exclude** from the index. Files that match this list will not be indexed, even if they match in `REPO_INDEXER_INCLUDE`.
- `REPO_INDEXER_EXCLUDE_VENDORED`: **true**: Exclude vendored files from index.
- `UPDATE_BUFFER_LEN`: **20**: Buffer length of index request.
- `MAX_FILE_SIZE`: **1048576**: Maximum size in bytes of files to be indexed.
- `STARTUP_TIMEOUT`: **30s**: If the indexer takes longer than this timeout to start - fail. (This timeout will be added to the hammer time above for child processes - as bleve will not start until the previous parent is shutdown.) Set to zero to never timeout.

View file

@ -42,6 +42,8 @@ Gitea applies glob pattern matching from the [`gobwas/glob` library](https://git
Limiting the list of files prevents the indexes from becoming polluted with derived or irrelevant files (e.g. lss, sym, map, etc.), so the search results are more relevant. It can also help reduce the index size.
`REPO_INDEXER_EXCLUDE_VENDORED` (default: true) excludes vendored files from index.
`REPO_INDEXER_INCLUDE` (default: empty) is a comma separated list of glob patterns to **include** in the index. An empty list means "_include all files_".
`REPO_INDEXER_EXCLUDE` (default: empty) is a comma separated list of glob patterns to **exclude** from the index. Files that match this list will not be indexed. `REPO_INDEXER_EXCLUDE` takes precedence over `REPO_INDEXER_INCLUDE`.