Working with big log files
Logdy utils command
A new mode has been added to Logdy recently, it's called utils
. It serves the purpose of making it easier to work with large files (all kinds, not only the ones containing the logs). The motivation behind adding this mode is because currently Logdy doesn't handle well big files. Of course, you can stream the contents of a multi-gigabyte file to Logdy, however finding a needle in a haystack can be cumbersome.
Very often, you know what to look for, whether it's a phrase, timestamp range or line number range. Exactly for that purpose utils
mode has been added. It currently comprises of 3 specific commands:
cut-by-line-number
- this command can help solve a question:- How to grep a file by a line number?
- How to filter a file by line numbers range?
- How to extract specific lines from a file by line number?
cut-by-string
- similar to the above, except that string phrases are used instead of line numbercut-by-date
- similar to filtering by line numbers, this utility can help you filter lines in a file by a date range:- Best way to filter output by date range
- How to filter output lines from bash command, based on dates in start of the line?
- Filter rows based on date
Unix tools can work too
utils
mode is no magic, it's a handy shortcut for actions that could take more steps using standard unix tools grep
, awk
, head
, tail
.
Logdy utils - examples
The examples below are based on a sample MongoDB log file (mongod.log), grab it here. Here's a preview of that sample:
2024-04-29T17:10:06.995+0200 I COMMAND [conn4] command mailer.eventlogs command: find { find: "eventlogs", filter: { processed_at: { $exists: false } }, projection: {}, limit: 1, singleBatch: true, batchSize: 1, returnKey: false, showRecordId: false } planSummary: IXSCAN { processed_at: 1 } keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:0 nreturned:0 reslen:104 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_query 1344ms
2024-04-29T17:10:10.357+0200 I NETWORK [thread1] connection accepted from 127.0.0.1:33006 #33 (20 connections now open)
2024-04-29T17:10:10.709+0200 I COMMAND [conn27] command admin.$cmd command: isMaster { ismaster: true } numYields:0 reslen:189 locks:{} protocol:op_query 361ms
2024-04-29T17:10:10.709+0200 I COMMAND [conn15] command admin.$cmd command: isMaster { ismaster: true } numYields:0 reslen:189 locks:{} protocol:op_query 361ms
2024-04-29T17:10:10.709+0200 I COMMAND [conn26] command admin.$cmd command: isMaster { ismaster: true } numYields:0 reslen:189 locks:{} protocol:op_query 361ms
2024-04-29T17:10:10.709+0200 I COMMAND [conn30] command admin.$cmd command: isMaster { ismaster: 1 } numYields:0 reslen:189 locks:{} protocol:op_query 361ms
2024-04-29T17:10:10.709+0200 I COMMAND [conn28] command admin.$cmd command: isMaster { ismaster: true } numYields:0 reslen:189 locks:{} protocol:op_query 361ms
2024-04-29T17:10:10.709+0200 I COMMAND [conn4] command admin.$cmd command: isMaster { ismaster: true } numYields:0 reslen:189 locks:{} protocol:op_query 361ms
2024-04-29T17:10:12.564+0200 I COMMAND [conn32] command admin.$cmd command: ping { ping: 1 } numYields:0 reslen:37 locks:{} protocol:op_query 135ms
Let's start using logdy utils
on that file.
Cutting by a line number - How to filter a file based on line numbers?
The below command will grab 5 lines skipping the first 100 lines.
$ logdy utils cut-by-line-number mongod.log 5 100
Cutting by string - How to filter a file based on string match?
The below command will only pass lines once 2024-04-29T17:10:16.629+0200
string is met. Then it will stop passing lines, once 2024-04-29T17:10:23.645+0200
string is met.
$ logdy utils cut-by-string mongod.log 2024-04-29T17:10:16.629+0200 2024-04-29T17:10:23.645+0200
Cutting by date - How to find lines in a file for a time range?
The below command will parse the timestamp 2006-01-02T15:04
at offset 0 of each line. Pay attention that even though the full format is 2024-04-29T17:10:10.709+0200
(it contains also seconds, milliseconds and a timezone), we only ask utility to parse date and time up to minutes. The utility will ignore the rest of the timestamp and successfully parse the part we are interested in.
$ logdy utils cut-by-date mongod.log 2024-04-29T17:10 2024-04-29T17:20 2006-01-02T15:04 0
A fast grep alternative
If you are looking for faster alternatives than grep
then I can recommend Sift, a fast and powerful alternative to grep. It's really fast, requires no installation (just like Logdy since it's written in Go too), is much more flexible than a standard grep
and has nice documentation.
How to find the number of lines in a file?
Use wc -l file.log
command to get the number of lines in a file. You can alternatively stream lines to this command, for example:
$ cat file.log | wc -l
TIP
If you're looking for a great resource for learning the basics of Unix filters, I recommend this website. It's simple and concise, provides a great introduction to Unix commands used to filter files.
How to split a large file into chunks?
Use the split
command. It's simple and effective command to split files of any size into smaller chunks
$ split [options] filename prefix
-l linenumber
-b bytes
It splits the file either by size (in bytes) or by line numbers. Last chunk will contain whatever is left after chunking the whole file in equal portions. split
can also read from STDIN, just put -
as the input filename. For more info refer to man split
Examples
Split the file by a certain line number
The command will split mongod.log
file into chunks containing 1000 lines each, each chunk will be named mongod-chunk
with suffix aa
, ab
, ac
etc
$ split -l 1000 mongod.log mongod-chunk
Split the file by a certain size
The command will split mongod.log
file into chunks each of exactly 50kb, however, the file may be split in
$ split -b 50k mongod.log mongod-chunk
How to deduplicate lines in a file?
Using sort
and uniq
commands you can easily deduplicate any number of lines in a file.
$ sort filename | uniq
If you are interested in how many occurrences of the same line uniq
command are processed, you can use -c
flag. Using this option will print the number of each line at the beginning of the line.
$ sort filename | unic -c
123 foo
12 bar
8 baz
How to combine multiple files into one?
The command below will create a new file called file
or overwrite the existing one
$ cat file1 file2 file3 > file
The command below will create a new file called file
if it doesn't exist or will append contents to the end of the file
$ cat file1 file2 file3 >> file
How to merge all of the files in a directory into one?
The below command will read all of the files (including directories, which will return an error) and append its contents to a file
.
$ cat * >> file
You can also filter the files by a pattern
$ cat *.go >> file
A more powerful alternative is to use find
command along with exec
option which will feed the output of find
to cat
$ find *.go -exec cat {} +
How to view a file with line numbers?
A non-interactive way using cat
, this command will display the contents of the file with line numbers for each line.
$ cat -n file
Interactive file viewer less
with line numbers, this command will display the contents of the file interactively (you can scroll the contents of a large file within single viewport) along with the line numbers.
$ less -N file