Working with big log files
Logdy utils command
A new mode has been added to Logdy recently, it's called utils. It serves the purpose of making it easier to work with large files (all kinds, not only the ones containing the logs). The motivation behind adding this mode is because currently Logdy doesn't handle well big files. Of course, you can stream the contents of a multi-gigabyte file to Logdy, however finding a needle in a haystack can be cumbersome.
Very often, you know what to look for, whether it's a phrase, timestamp range or line number range. Exactly for that purpose utils mode has been added. It currently comprises of 3 specific commands:
cut-by-line-number- this command can help solve a question:- How to grep a file by a line number?
- How to filter a file by line numbers range?
- How to extract specific lines from a file by line number?
cut-by-string- similar to the above, except that string phrases are used instead of line numbercut-by-date- similar to filtering by line numbers, this utility can help you filter lines in a file by a date range:- Best way to filter output by date range
- How to filter output lines from bash command, based on dates in start of the line?
- Filter rows based on date
Unix tools can work too
utils mode is no magic, it's a handy shortcut for actions that could take more steps using standard unix tools grep, awk, head, tail.
Logdy utils - examples
The examples below are based on a sample MongoDB log file (mongod.log), grab it here. Here's a preview of that sample:
2024-04-29T17:10:06.995+0200 I COMMAND [conn4] command mailer.eventlogs command: find { find: "eventlogs", filter: { processed_at: { $exists: false } }, projection: {}, limit: 1, singleBatch: true, batchSize: 1, returnKey: false, showRecordId: false } planSummary: IXSCAN { processed_at: 1 } keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:0 nreturned:0 reslen:104 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_query 1344ms
2024-04-29T17:10:10.357+0200 I NETWORK [thread1] connection accepted from 127.0.0.1:33006 #33 (20 connections now open)
2024-04-29T17:10:10.709+0200 I COMMAND [conn27] command admin.$cmd command: isMaster { ismaster: true } numYields:0 reslen:189 locks:{} protocol:op_query 361ms
2024-04-29T17:10:10.709+0200 I COMMAND [conn15] command admin.$cmd command: isMaster { ismaster: true } numYields:0 reslen:189 locks:{} protocol:op_query 361ms
2024-04-29T17:10:10.709+0200 I COMMAND [conn26] command admin.$cmd command: isMaster { ismaster: true } numYields:0 reslen:189 locks:{} protocol:op_query 361ms
2024-04-29T17:10:10.709+0200 I COMMAND [conn30] command admin.$cmd command: isMaster { ismaster: 1 } numYields:0 reslen:189 locks:{} protocol:op_query 361ms
2024-04-29T17:10:10.709+0200 I COMMAND [conn28] command admin.$cmd command: isMaster { ismaster: true } numYields:0 reslen:189 locks:{} protocol:op_query 361ms
2024-04-29T17:10:10.709+0200 I COMMAND [conn4] command admin.$cmd command: isMaster { ismaster: true } numYields:0 reslen:189 locks:{} protocol:op_query 361ms
2024-04-29T17:10:12.564+0200 I COMMAND [conn32] command admin.$cmd command: ping { ping: 1 } numYields:0 reslen:37 locks:{} protocol:op_query 135msLet's start using logdy utils on that file.
Cutting by a line number - How to filter a file based on line numbers?
The below command will grab 5 lines skipping the first 100 lines.
$ logdy utils cut-by-line-number mongod.log 5 100Cutting by string - How to filter a file based on string match?
The below command will only pass lines once 2024-04-29T17:10:16.629+0200 string is met. Then it will stop passing lines, once 2024-04-29T17:10:23.645+0200 string is met.
$ logdy utils cut-by-string mongod.log 2024-04-29T17:10:16.629+0200 2024-04-29T17:10:23.645+0200Cutting by date - How to find lines in a file for a time range?
The below command will parse the timestamp 2006-01-02T15:04 at offset 0 of each line. Pay attention that even though the full format is 2024-04-29T17:10:10.709+0200 (it contains also seconds, milliseconds and a timezone), we only ask utility to parse date and time up to minutes. The utility will ignore the rest of the timestamp and successfully parse the part we are interested in.
$ logdy utils cut-by-date mongod.log 2024-04-29T17:10 2024-04-29T17:20 2006-01-02T15:04 0A fast grep alternative
If you are looking for faster alternatives than grep then I can recommend Sift, a fast and powerful alternative to grep. It's really fast, requires no installation (just like Logdy since it's written in Go too), is much more flexible than a standard grep and has nice documentation.
How to find the number of lines in a file?
Use wc -l file.log command to get the number of lines in a file. You can alternatively stream lines to this command, for example:
$ cat file.log | wc -lTIP
If you're looking for a great resource for learning the basics of Unix filters, I recommend this website. It's simple and concise, provides a great introduction to Unix commands used to filter files.
How to split a large file into chunks?
Use the split command. It's simple and effective command to split files of any size into smaller chunks
$ split [options] filename prefix
-l linenumber
-b bytesIt splits the file either by size (in bytes) or by line numbers. Last chunk will contain whatever is left after chunking the whole file in equal portions. split can also read from STDIN, just put - as the input filename. For more info refer to man split
Examples
Split the file by a certain line number
The command will split mongod.log file into chunks containing 1000 lines each, each chunk will be named mongod-chunk with suffix aa, ab, ac etc
$ split -l 1000 mongod.log mongod-chunkSplit the file by a certain size
The command will split mongod.log file into chunks each of exactly 50kb, however, the file may be split in
$ split -b 50k mongod.log mongod-chunkHow to deduplicate lines in a file?
Using sort and uniq commands you can easily deduplicate any number of lines in a file.
$ sort filename | uniqIf you are interested in how many occurrences of the same line uniq command are processed, you can use -c flag. Using this option will print the number of each line at the beginning of the line.
$ sort filename | unic -c
123 foo
12 bar
8 bazHow to combine multiple files into one?
The command below will create a new file called file or overwrite the existing one
$ cat file1 file2 file3 > fileThe command below will create a new file called file if it doesn't exist or will append contents to the end of the file
$ cat file1 file2 file3 >> fileHow to merge all of the files in a directory into one?
The below command will read all of the files (including directories, which will return an error) and append its contents to a file.
$ cat * >> fileYou can also filter the files by a pattern
$ cat *.go >> fileA more powerful alternative is to use find command along with exec option which will feed the output of find to cat
$ find *.go -exec cat {} +How to view a file with line numbers?
A non-interactive way using cat, this command will display the contents of the file with line numbers for each line.
$ cat -n fileInteractive file viewer less with line numbers, this command will display the contents of the file interactively (you can scroll the contents of a large file within single viewport) along with the line numbers.
$ less -N file