Skip to content

Working with big log files

Logdy utils command

A new mode has been added to Logdy recently, it's called utils. It serves the purpose of making it easier to work with large files (all kinds, not only the ones containing the logs). The motivation behind adding this mode is because currently Logdy doesn't handle well big files. Of course, you can stream the contents of a multi-gigabyte file to Logdy, however finding a needle in a haystack can be cumbersome.

Very often, you know what to look for, whether it's a phrase, timestamp range or line number range. Exactly for that purpose utils mode has been added. It currently comprises of 3 specific commands:

  • cut-by-line-number - this command can help solve a question:
    • How to grep a file by a line number?
    • How to filter a file by line numbers range?
    • How to extract specific lines from a file by line number?
  • cut-by-string - similar to the above, except that string phrases are used instead of line number
  • cut-by-date - similar to filtering by line numbers, this utility can help you filter lines in a file by a date range:
    • Best way to filter output by date range
    • How to filter output lines from bash command, based on dates in start of the line?
    • Filter rows based on date

Unix tools can work too

utils mode is no magic, it's a handy shortcut for actions that could take more steps using standard unix tools grep, awk, head, tail.

Logdy utils - examples

The examples below are based on a sample MongoDB log file (mongod.log), grab it here. Here's a preview of that sample:

text
2024-04-29T17:10:06.995+0200 I COMMAND  [conn4] command mailer.eventlogs command: find { find: "eventlogs", filter: { processed_at: { $exists: false } }, projection: {}, limit: 1, singleBatch: true, batchSize: 1, returnKey: false, showRecordId: false } planSummary: IXSCAN { processed_at: 1 } keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:0 nreturned:0 reslen:104 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_query 1344ms
2024-04-29T17:10:10.357+0200 I NETWORK  [thread1] connection accepted from 127.0.0.1:33006 #33 (20 connections now open)
2024-04-29T17:10:10.709+0200 I COMMAND  [conn27] command admin.$cmd command: isMaster { ismaster: true } numYields:0 reslen:189 locks:{} protocol:op_query 361ms
2024-04-29T17:10:10.709+0200 I COMMAND  [conn15] command admin.$cmd command: isMaster { ismaster: true } numYields:0 reslen:189 locks:{} protocol:op_query 361ms
2024-04-29T17:10:10.709+0200 I COMMAND  [conn26] command admin.$cmd command: isMaster { ismaster: true } numYields:0 reslen:189 locks:{} protocol:op_query 361ms
2024-04-29T17:10:10.709+0200 I COMMAND  [conn30] command admin.$cmd command: isMaster { ismaster: 1 } numYields:0 reslen:189 locks:{} protocol:op_query 361ms
2024-04-29T17:10:10.709+0200 I COMMAND  [conn28] command admin.$cmd command: isMaster { ismaster: true } numYields:0 reslen:189 locks:{} protocol:op_query 361ms
2024-04-29T17:10:10.709+0200 I COMMAND  [conn4] command admin.$cmd command: isMaster { ismaster: true } numYields:0 reslen:189 locks:{} protocol:op_query 361ms
2024-04-29T17:10:12.564+0200 I COMMAND  [conn32] command admin.$cmd command: ping { ping: 1 } numYields:0 reslen:37 locks:{} protocol:op_query 135ms

Let's start using logdy utils on that file.

Cutting by a line number - How to filter a file based on line numbers?

The below command will grab 5 lines skipping the first 100 lines.

bash
$ logdy utils cut-by-line-number mongod.log 5 100

Cutting by string - How to filter a file based on string match?

The below command will only pass lines once 2024-04-29T17:10:16.629+0200 string is met. Then it will stop passing lines, once 2024-04-29T17:10:23.645+0200 string is met.

bash
$ logdy utils cut-by-string mongod.log 2024-04-29T17:10:16.629+0200 2024-04-29T17:10:23.645+0200

Cutting by date - How to find lines in a file for a time range?

The below command will parse the timestamp 2006-01-02T15:04 at offset 0 of each line. Pay attention that even though the full format is 2024-04-29T17:10:10.709+0200 (it contains also seconds, milliseconds and a timezone), we only ask utility to parse date and time up to minutes. The utility will ignore the rest of the timestamp and successfully parse the part we are interested in.

bash
$ logdy utils cut-by-date mongod.log 2024-04-29T17:10 2024-04-29T17:20 2006-01-02T15:04 0

A fast grep alternative

If you are looking for faster alternatives than grep then I can recommend Sift, a fast and powerful alternative to grep. It's really fast, requires no installation (just like Logdy since it's written in Go too), is much more flexible than a standard grep and has nice documentation.

How to find the number of lines in a file?

Use wc -l file.log command to get the number of lines in a file. You can alternatively stream lines to this command, for example:

bash
$ cat file.log | wc -l

TIP

If you're looking for a great resource for learning the basics of Unix filters, I recommend this website. It's simple and concise, provides a great introduction to Unix commands used to filter files.

How to split a large file into chunks?

Use the split command. It's simple and effective command to split files of any size into smaller chunks

bash
$ split [options] filename prefix

  -l linenumber

  -b bytes

It splits the file either by size (in bytes) or by line numbers. Last chunk will contain whatever is left after chunking the whole file in equal portions. split can also read from STDIN, just put - as the input filename. For more info refer to man split

Examples

Split the file by a certain line number

The command will split mongod.log file into chunks containing 1000 lines each, each chunk will be named mongod-chunk with suffix aa, ab, ac etc

bash
$ split -l 1000 mongod.log mongod-chunk

Split the file by a certain size

The command will split mongod.log file into chunks each of exactly 50kb, however, the file may be split in

bash
$ split -b 50k mongod.log mongod-chunk

How to deduplicate lines in a file?

Using sort and uniq commands you can easily deduplicate any number of lines in a file.

bash
$ sort filename | uniq

If you are interested in how many occurrences of the same line uniq command are processed, you can use -c flag. Using this option will print the number of each line at the beginning of the line.

bash
$ sort filename | unic -c
123 foo
 12 bar
  8 baz

How to combine multiple files into one?

The command below will create a new file called file or overwrite the existing one

bash
$ cat file1 file2 file3 > file

The command below will create a new file called file if it doesn't exist or will append contents to the end of the file

bash
$ cat file1 file2 file3 >> file

How to merge all of the files in a directory into one?

The below command will read all of the files (including directories, which will return an error) and append its contents to a file.

bash
$ cat * >> file

You can also filter the files by a pattern

bash
$ cat *.go >> file

A more powerful alternative is to use find command along with exec option which will feed the output of find to cat

bash
$ find *.go -exec cat {} +

How to view a file with line numbers?

A non-interactive way using cat, this command will display the contents of the file with line numbers for each line.

bash
$ cat -n file

Interactive file viewer less with line numbers, this command will display the contents of the file interactively (you can scroll the contents of a large file within single viewport) along with the line numbers.

bash
$ less -N file