Watch out! You're reading a series of articles

Logdy Pro Part I: The Problem
The article clarifies the problems being solved by Logdy Pro and a way it does that
Logdy Pro Part II: Storage
The article uncovers how Logdy Pro is able to achieve 15-20x compression rate while still maintain fast search and query capabilities
(currently reading) Logdy Pro Part III: Benchmark
The article presents comprehensive benchmark results demonstrating Logdy Pro's performance metrics and storage efficiency compared to industry alternatives.

Logdy Pro Part III: Advanced Logs Compression Benchmark Results

In Part I, we explored the log management challenges that Logdy Pro addresses, and in Part II, we examined the innovative storage architecture that enables its exceptional logs compression efficiency. In this final article, we'll present real-world benchmark results that demonstrate how Logdy Pro's high-performance logs compression technology performs in practical scenarios, significantly reducing storage requirements while maintaining query capabilities.

Logs Compression Benchmark Methodology

To evaluate Logdy Pro's logs compression performance in realistic conditions, we conducted a series of benchmarks using both synthetic and production log datasets. Our testing focused on three key metrics:

Logs compression efficiency — How effectively does Logdy Pro reduce log storage requirements compared to standard compression algorithms?
Query performance on compressed logs — How quickly can Logdy Pro execute different types of queries on compressed log data?
Resource utilization during compression — What are the CPU and memory requirements during logs compression and querying operations?

Synthetic Logs Test Dataset for Compression Benchmarking

For our primary logs compression benchmark, we generated a synthetic dataset with characteristics that mirror real-world production logs. We carefully designed this dataset to test compression capabilities with patterns typical in enterprise logging environments:

Include common log patterns and structures
Maintain realistic value distributions
Incorporate appropriate levels of repetition and uniqueness
Avoid excessive randomness that would artificially reduce compression efficiency

The synthetic test file had the following characteristics:

Test data
42,949,672 lines of JSON (approximately 43 million entries)
Total size: 11,187,102,133 bytes (10.6 GB)
Average line size: 260 bytes

This dataset size was chosen to represent several days of logs from a medium-sized production environment, providing a realistic benchmark scenario for our logs compression testing.

Sample Data Structure for Logs Compression Analysis

The synthetic dataset contained log entries with structures similar to those found in production environments, including:

json

{"ts":"2025-02-24T11:58:02.006Z","service":"payment-gateway","lvl":"INFO","msg":"Merchant payout initiated","ctx":{"amount":67.26258211055193,"dispute_id":"dp_202","evidence_due":"2024-02-15","reason":"unauthorized","risk_score":17.6637638202057}}
{"ts":"2025-02-24T11:58:02.029Z","service":"web-server","lvl":"ERROR","msg":"Caching response with TTL","correlation_id":"c57407f0-4323-4148-b6bd-9239a44b28fa","ctx":{"method":"POST","recipients":100,"request_size_bytes":4798,"response_size_bytes":5087,"response_time_ms":1567,"route":"/api/notifications","status_code":232,"type":"push"}}
{"ts":"2025-02-24T11:58:02.039Z","service":"web-server","lvl":"ERROR","msg":"Preparing JSON response payload","correlation_id":"c57407f0-4323-4148-b6bd-9239a44b28fa","ctx":{"changes":"email","method":"PUT","request_size_bytes":3468,"response_size_bytes":39029,"response_time_ms":1326,"route":"/api/profile","status_code":231,"user_id":"u_456"}}
{"ts":"2025-02-24T11:58:02.041Z","service":"web-server","lvl":"ERROR","msg":"Rate limit exceeded for IP","correlation_id":"c57407f0-4323-4148-b6bd-9239a44b28fa","ctx":{"method":"GET","metrics_count":150,"period":"1h","request_size_bytes":2422,"response_size_bytes":15254,"response_time_ms":158,"route":"/api/metrics","status_code":207}}
{"ts":"2025-02-24T11:58:02.062Z","service":"authentication","lvl":"WARN","msg":"Password reset initiated for user","correlation_id":"c57407f0-4323-4148-b6bd-9239a44b28fa","ctx":{"action":"two_factor_auth","attempts":6,"method":"sms","phone":"+1234567890","provider":"twilio","session_duration_sec":3598}}
{"ts":"2025-02-24T11:58:02.072Z","service":"web-server","lvl":"DEBUG","msg":"Caching response with TTL","correlation_id":"c57407f0-4323-4148-b6bd-9239a44b28fa","ctx":{"email_sent":true,"method":"POST","provider":"smtp","request_size_bytes":1770,"response_size_bytes":28197,"response_time_ms":696,"route":"/api/reset-password","status_code":286}}

The dataset included elements that challenge traditional logs compression techniques:

Various service names (web-server, authentication, payment-gateway, etc.)
Different log levels (INFO, DEBUG, WARN, ERROR)
Timestamps with millisecond precision
Nested JSON structures in context fields
Correlation IDs for tracing requests
Numeric and string values with realistic distributions
Common patterns found in web server, authentication, and payment processing logs

Logs Compression Test Environment

For comparison purposes, we also tested the same datasets with other popular logs compression methods:

Gzip compression (level 9) - widely used for log file compression
Zstandard compression (level 19) - modern compression algorithm gaining popularity for log compression
Elasticsearch 8.11.0 (single-node configuration) - common log management solution with built-in compression

Logs Compression Benchmark Results

Synthetic Dataset Logs Compression Performance

We processed the 10.6GB synthetic log dataset with Logdy Pro's specialized logs compression technology, resulting in:

450 MB for the index file
526 MB for the data file
----------------------------
976 MB total compressed size

This represents a 10.9x logs compression ratio (or a 90.8% reduction in size) compared to the original 10.6GB of raw log data.

For comparison with other logs compression methods:

Gzip (level 9): 2.1GB (5.0x logs compression ratio)
Zstandard (level 19): 1.8GB (5.9x logs compression ratio)
Elasticsearch: 3.2GB (3.3x logs compression ratio) plus 1.1GB for indices

Logdy Pro's specialized logs compression format achieves significantly better compression than general-purpose compression algorithms or traditional log storage systems, making it ideal for organizations dealing with massive log volumes.

Production Dataset Logs Compression Performance

To validate our logs compression results in real-world conditions, we also tested Logdy Pro with an actual production log dataset from an enterprise environment:

Production test data
8,430,233 lines of JSON (approximately 8.4 million entries)
Total size: 2,038,578,943 bytes (1.94 GB)
Average line size: 241 bytes

After processing with Logdy Pro's logs compression:

42 MB for the index file
94 MB for the data file
------------------------------
136 MB total compressed size

This represents a 14.3x logs compression ratio (or a 93.0% reduction in size) compared to the original 1.94GB of raw log data.

The higher logs compression ratio on the production dataset demonstrates that Logdy Pro's compression technology performs even better on real-world log data, which typically contains more repetitive patterns than our synthetic dataset. This makes Logdy Pro particularly effective for long-term log storage and archiving needs.

Query Performance Benchmark on Compressed Logs

INFO

Important Note on Query Performance with Compressed Logs

Logdy Pro is designed with a "cold query" model for compressed logs in mind:

It does not maintain an in-memory cache of compressed log data (though OS file caching may occur)
All queries access compressed logs directly from disk
The logs compression system is optimized for environments with limited resources (single CPU core, 2-4 GB RAM)
Currently, query execution on compressed logs is single-threaded (multi-core support planned for future releases)

Query Types Tested on Compressed Logs

We benchmarked three representative query types that cover most common log analysis scenarios on compressed log data:

Index-Only Query on Compressed Logs — Queries that can be satisfied using only the index file
- Example: service="web-server" AND lvl="ERROR"
- These queries filter on fields that are stored in the compressed index file
Full Data Scan Query on Compressed Logs — Queries that require scanning the compressed data file
- Example: ctx.response_time_ms > 1000 OR msg includes "timeout"
- These queries need to access fields stored in the compressed data file or perform text searches
Facet Generation Query on Compressed Logs — Queries that generate aggregations
- Example: Count of logs by service and level
- These queries build distributions of values across the compressed dataset

Benchmark Methodology for Compressed Logs Queries

For each query type on compressed logs:

We ran the query 10 times
We discarded the first and last runs to eliminate warm-up and outlier effects
We calculated the average execution time from the remaining runs
All tests were run with a cold start (cleared OS cache between test series)

Query Performance Results on Compressed Logs

Results for the synthetic dataset (10.6GB original size, 976MB after logs compression):

Query Type #1 (Index-Only): 970ms average execution time
Query Type #2 (Full Data Scan): 2.3 seconds average execution time
Query Type #3 (Facet Generation): 750ms average execution time

For comparison with Elasticsearch (same dataset):

Index-Only Query: 210ms
Full Data Scan: 3.2 seconds
Facet Generation: 420ms

Performance Analysis of Compressed Logs Queries

Index-Only Queries perform well (sub-second) due to the efficient compressed index structure
Full Data Scan Queries are currently the slowest, as they require reading and decompressing portions of the data file
Facet Generation is surprisingly fast due to the columnar storage format optimized for logs compression

Performance Scaling with Compressed Logs

We also tested how query performance scales with compressed logs dataset size:

Our findings show that:

Index-Only Queries scale almost linearly with compressed logs dataset size
Full Data Scan Queries scale linearly but with a steeper slope when working with compressed logs
Facet Generation scales slightly better than linearly due to the columnar format used in our logs compression

The production dataset (1.94GB original, 136MB after logs compression) showed proportionally faster query times across all query types, confirming that performance scales roughly linearly with compressed data size.

Future Logs Compression Optimizations

Advanced Logs Compression Techniques

We're currently developing additional logs compression optimizations that will further reduce storage requirements. In preliminary testing with our production dataset, we've achieved:

13 MB for the index file (down from 42 MB)
94 MB for the data file (unchanged)
------------------------------
107 MB total compressed size (1.94 GB original)

This represents an 18.1x logs compression ratio (or a 94.5% reduction) compared to the original 1.94GB.

The key improvements in our logs compression technology come from:

Enhanced dictionary encoding for logs — More efficient representation of repeated patterns in log data
Improved bit-packing for log fields — Better utilization of available bits in integer storage
Optimized timestamp encoding for logs — Further refinement of the double-delta encoding specifically for log timestamps

Performance Optimizations for Compressed Logs Queries

We're also working on several performance enhancements for querying compressed logs:

Multi-threaded query execution on compressed logs — Utilizing multiple CPU cores for parallel processing
Memory allocation optimizations for logs compression — Reducing allocations during query execution
Improved query planning for compressed logs — Better utilization of index structures
Selective decompression of logs — Only decompressing the portions of log data needed for a query

Our internal benchmarks suggest these optimizations could improve full data scan query performance on compressed logs by 5-7x, bringing execution times down to 2-3 seconds for the 10.6GB dataset.

Comparison with Alternative Logs Compression Solutions

To provide context for Logdy Pro's logs compression performance, we compared it with several alternative log management approaches:

Solution	Logs Compression Efficiency	Query Performance on Compressed Logs	Operational Complexity	Resource Requirements
Logdy Pro	Excellent (10-15x logs compression)	Good (sub-second to seconds)	Low (single binary)	Low (2-4GB RAM, 1 CPU)
ELK Stack	Moderate (3-5x logs compression)	Excellent (milliseconds)	High (cluster management)	High (16-32GB RAM per node)
Log Files + Grep	Poor (1-3x with gzip)	Poor (minutes for large files)	Low (standard tools)	Low (depends on file size)
Database Storage	Moderate (depends on DB)	Good (seconds)	Moderate (DB management)	Moderate (8-16GB RAM)
Cloud Log Services	N/A (managed service)	Excellent (milliseconds)	Low (managed service)	N/A (managed service)

This comparison highlights Logdy Pro's unique position in the logs compression market:

Best-in-class logs compression efficiency — 2-5x better than alternative compression methods
Competitive query performance on compressed logs — Especially for index-only queries
Minimal operational complexity — Single binary, no cluster to manage
Low resource requirements — Runs on modest hardware

Conclusion: Balancing the Logs Compression Trade-offs

The benchmark results confirm Logdy Pro's ability to deliver on its promise of balancing the database trade-off triangle with superior logs compression:

Exceptional logs compression efficiency — 10-15x compression ratios significantly reduce storage costs and enable longer log retention periods
Acceptable query performance on compressed logs — Sub-second to seconds response times for most queries, with ongoing optimizations to improve full data scan performance
Minimal operational complexity — Simple deployment, low resource requirements, and no complex cluster management

For organizations generating large volumes of log data who want to:

Reduce log storage costs through advanced logs compression
Simplify their log management infrastructure
Maintain control over their compressed log data
Achieve reasonable query performance on compressed logs

Logdy Pro offers a compelling alternative to both complex distributed systems like the ELK stack and basic approaches like grep on compressed log files.

When to Choose Logdy Pro for Logs Compression

Logdy Pro's logs compression technology is particularly well-suited for:

Mid-size organizations with significant log volumes but limited infrastructure resources
Self-hosted environments where data sovereignty is important
Cost-sensitive deployments where logs compression and storage efficiency are prioritized
Environments with modest query performance requirements where sub-second to seconds response times are acceptable
Organizations needing long-term log retention where advanced logs compression provides significant cost savings

Getting Started with Logdy Pro Logs Compression

To try Logdy Pro's advanced logs compression with your own log data simply contact me at: peter(at)logdy.dev or use the contact form below.

Interested in using LogdyPro? Let's get in contact!

Watch out! You're reading a series of articles

Logdy Pro Part I: The Problem
The article clarifies the problems being solved by Logdy Pro and a way it does that
Logdy Pro Part II: Storage
The article uncovers how Logdy Pro is able to achieve 15-20x compression rate while still maintain fast search and query capabilities
(currently reading) Logdy Pro Part III: Benchmark
The article presents comprehensive benchmark results demonstrating Logdy Pro's performance metrics and storage efficiency compared to industry alternatives.

Logdy Pro Part III: Advanced Logs Compression Benchmark Results ​

Logs Compression Benchmark Methodology ​

Synthetic Logs Test Dataset for Compression Benchmarking ​

Sample Data Structure for Logs Compression Analysis ​

Logs Compression Test Environment ​

Logs Compression Benchmark Results ​

Synthetic Dataset Logs Compression Performance ​

Production Dataset Logs Compression Performance ​

Query Performance Benchmark on Compressed Logs ​

Query Types Tested on Compressed Logs ​

Benchmark Methodology for Compressed Logs Queries ​

Query Performance Results on Compressed Logs ​

Performance Analysis of Compressed Logs Queries ​

Performance Scaling with Compressed Logs ​

Future Logs Compression Optimizations ​

Advanced Logs Compression Techniques ​

Performance Optimizations for Compressed Logs Queries ​

Comparison with Alternative Logs Compression Solutions ​

Conclusion: Balancing the Logs Compression Trade-offs ​

When to Choose Logdy Pro for Logs Compression ​

Getting Started with Logdy Pro Logs Compression ​