Skip to content

Watch out! You're reading a series of articles

  • Logdy Pro Part I: The Problem

    The article clarifies the problems being solved by Logdy Pro and a way it does that

  • Logdy Pro Part II: Storage

    The article uncovers how Logdy Pro is able to achieve 15-20x compression rate while still maintain fast search and query capabilities

  • (currently reading) Logdy Pro Part III: Benchmark

    The article presents comprehensive benchmark results demonstrating Logdy Pro's performance metrics and storage efficiency compared to industry alternatives.

Logdy Pro Part III: Advanced Logs Compression Benchmark Results

In Part I, we explored the log management challenges that Logdy Pro addresses, and in Part II, we examined the innovative storage architecture that enables its exceptional logs compression efficiency. In this final article, we'll present real-world benchmark results that demonstrate how Logdy Pro's high-performance logs compression technology performs in practical scenarios, significantly reducing storage requirements while maintaining query capabilities.

Logs Compression Benchmark Methodology

To evaluate Logdy Pro's logs compression performance in realistic conditions, we conducted a series of benchmarks using both synthetic and production log datasets. Our testing focused on three key metrics:

  1. Logs compression efficiency — How effectively does Logdy Pro reduce log storage requirements compared to standard compression algorithms?
  2. Query performance on compressed logs — How quickly can Logdy Pro execute different types of queries on compressed log data?
  3. Resource utilization during compression — What are the CPU and memory requirements during logs compression and querying operations?

Synthetic Logs Test Dataset for Compression Benchmarking

For our primary logs compression benchmark, we generated a synthetic dataset with characteristics that mirror real-world production logs. We carefully designed this dataset to test compression capabilities with patterns typical in enterprise logging environments:

  • Include common log patterns and structures
  • Maintain realistic value distributions
  • Incorporate appropriate levels of repetition and uniqueness
  • Avoid excessive randomness that would artificially reduce compression efficiency

The synthetic test file had the following characteristics:

Test data
42,949,672 lines of JSON (approximately 43 million entries)
Total size: 11,187,102,133 bytes (10.6 GB)
Average line size: 260 bytes

This dataset size was chosen to represent several days of logs from a medium-sized production environment, providing a realistic benchmark scenario for our logs compression testing.

Sample Data Structure for Logs Compression Analysis

The synthetic dataset contained log entries with structures similar to those found in production environments, including:

json
{"ts":"2025-02-24T11:58:02.006Z","service":"payment-gateway","lvl":"INFO","msg":"Merchant payout initiated","ctx":{"amount":67.26258211055193,"dispute_id":"dp_202","evidence_due":"2024-02-15","reason":"unauthorized","risk_score":17.6637638202057}}
{"ts":"2025-02-24T11:58:02.029Z","service":"web-server","lvl":"ERROR","msg":"Caching response with TTL","correlation_id":"c57407f0-4323-4148-b6bd-9239a44b28fa","ctx":{"method":"POST","recipients":100,"request_size_bytes":4798,"response_size_bytes":5087,"response_time_ms":1567,"route":"/api/notifications","status_code":232,"type":"push"}}
{"ts":"2025-02-24T11:58:02.039Z","service":"web-server","lvl":"ERROR","msg":"Preparing JSON response payload","correlation_id":"c57407f0-4323-4148-b6bd-9239a44b28fa","ctx":{"changes":"email","method":"PUT","request_size_bytes":3468,"response_size_bytes":39029,"response_time_ms":1326,"route":"/api/profile","status_code":231,"user_id":"u_456"}}
{"ts":"2025-02-24T11:58:02.041Z","service":"web-server","lvl":"ERROR","msg":"Rate limit exceeded for IP","correlation_id":"c57407f0-4323-4148-b6bd-9239a44b28fa","ctx":{"method":"GET","metrics_count":150,"period":"1h","request_size_bytes":2422,"response_size_bytes":15254,"response_time_ms":158,"route":"/api/metrics","status_code":207}}
{"ts":"2025-02-24T11:58:02.062Z","service":"authentication","lvl":"WARN","msg":"Password reset initiated for user","correlation_id":"c57407f0-4323-4148-b6bd-9239a44b28fa","ctx":{"action":"two_factor_auth","attempts":6,"method":"sms","phone":"+1234567890","provider":"twilio","session_duration_sec":3598}}
{"ts":"2025-02-24T11:58:02.072Z","service":"web-server","lvl":"DEBUG","msg":"Caching response with TTL","correlation_id":"c57407f0-4323-4148-b6bd-9239a44b28fa","ctx":{"email_sent":true,"method":"POST","provider":"smtp","request_size_bytes":1770,"response_size_bytes":28197,"response_time_ms":696,"route":"/api/reset-password","status_code":286}}

The dataset included elements that challenge traditional logs compression techniques:

  • Various service names (web-server, authentication, payment-gateway, etc.)
  • Different log levels (INFO, DEBUG, WARN, ERROR)
  • Timestamps with millisecond precision
  • Nested JSON structures in context fields
  • Correlation IDs for tracing requests
  • Numeric and string values with realistic distributions
  • Common patterns found in web server, authentication, and payment processing logs

Logs Compression Test Environment

For comparison purposes, we also tested the same datasets with other popular logs compression methods:

  • Gzip compression (level 9) - widely used for log file compression
  • Zstandard compression (level 19) - modern compression algorithm gaining popularity for log compression
  • Elasticsearch 8.11.0 (single-node configuration) - common log management solution with built-in compression

Logs Compression Benchmark Results

Synthetic Dataset Logs Compression Performance

We processed the 10.6GB synthetic log dataset with Logdy Pro's specialized logs compression technology, resulting in:

450 MB for the index file
526 MB for the data file
----------------------------
976 MB total compressed size

This represents a 10.9x logs compression ratio (or a 90.8% reduction in size) compared to the original 10.6GB of raw log data.

For comparison with other logs compression methods:

  • Gzip (level 9): 2.1GB (5.0x logs compression ratio)
  • Zstandard (level 19): 1.8GB (5.9x logs compression ratio)
  • Elasticsearch: 3.2GB (3.3x logs compression ratio) plus 1.1GB for indices

Logdy Pro's specialized logs compression format achieves significantly better compression than general-purpose compression algorithms or traditional log storage systems, making it ideal for organizations dealing with massive log volumes.

Production Dataset Logs Compression Performance

To validate our logs compression results in real-world conditions, we also tested Logdy Pro with an actual production log dataset from an enterprise environment:

Production test data
8,430,233 lines of JSON (approximately 8.4 million entries)
Total size: 2,038,578,943 bytes (1.94 GB)
Average line size: 241 bytes

After processing with Logdy Pro's logs compression:

42 MB for the index file
94 MB for the data file
------------------------------
136 MB total compressed size

This represents a 14.3x logs compression ratio (or a 93.0% reduction in size) compared to the original 1.94GB of raw log data.

The higher logs compression ratio on the production dataset demonstrates that Logdy Pro's compression technology performs even better on real-world log data, which typically contains more repetitive patterns than our synthetic dataset. This makes Logdy Pro particularly effective for long-term log storage and archiving needs.

Query Performance Benchmark on Compressed Logs

INFO

Important Note on Query Performance with Compressed Logs

Logdy Pro is designed with a "cold query" model for compressed logs in mind:

  • It does not maintain an in-memory cache of compressed log data (though OS file caching may occur)
  • All queries access compressed logs directly from disk
  • The logs compression system is optimized for environments with limited resources (single CPU core, 2-4 GB RAM)
  • Currently, query execution on compressed logs is single-threaded (multi-core support planned for future releases)

Query Types Tested on Compressed Logs

We benchmarked three representative query types that cover most common log analysis scenarios on compressed log data:

  1. Index-Only Query on Compressed Logs — Queries that can be satisfied using only the index file

    • Example: service="web-server" AND lvl="ERROR"
    • These queries filter on fields that are stored in the compressed index file
  2. Full Data Scan Query on Compressed Logs — Queries that require scanning the compressed data file

    • Example: ctx.response_time_ms > 1000 OR msg includes "timeout"
    • These queries need to access fields stored in the compressed data file or perform text searches
  3. Facet Generation Query on Compressed Logs — Queries that generate aggregations

    • Example: Count of logs by service and level
    • These queries build distributions of values across the compressed dataset

Benchmark Methodology for Compressed Logs Queries

For each query type on compressed logs:

  • We ran the query 10 times
  • We discarded the first and last runs to eliminate warm-up and outlier effects
  • We calculated the average execution time from the remaining runs
  • All tests were run with a cold start (cleared OS cache between test series)

Query Performance Results on Compressed Logs

Results for the synthetic dataset (10.6GB original size, 976MB after logs compression):

Query Type #1 (Index-Only): 970ms average execution time
Query Type #2 (Full Data Scan): 2.3 seconds average execution time
Query Type #3 (Facet Generation): 750ms average execution time

For comparison with Elasticsearch (same dataset):

  • Index-Only Query: 210ms
  • Full Data Scan: 3.2 seconds
  • Facet Generation: 420ms

Performance Analysis of Compressed Logs Queries

  • Index-Only Queries perform well (sub-second) due to the efficient compressed index structure
  • Full Data Scan Queries are currently the slowest, as they require reading and decompressing portions of the data file
  • Facet Generation is surprisingly fast due to the columnar storage format optimized for logs compression

Performance Scaling with Compressed Logs

We also tested how query performance scales with compressed logs dataset size:

Our findings show that:

  1. Index-Only Queries scale almost linearly with compressed logs dataset size
  2. Full Data Scan Queries scale linearly but with a steeper slope when working with compressed logs
  3. Facet Generation scales slightly better than linearly due to the columnar format used in our logs compression

The production dataset (1.94GB original, 136MB after logs compression) showed proportionally faster query times across all query types, confirming that performance scales roughly linearly with compressed data size.

Future Logs Compression Optimizations

Advanced Logs Compression Techniques

We're currently developing additional logs compression optimizations that will further reduce storage requirements. In preliminary testing with our production dataset, we've achieved:

13 MB for the index file (down from 42 MB)
94 MB for the data file (unchanged)
------------------------------
107 MB total compressed size (1.94 GB original)

This represents an 18.1x logs compression ratio (or a 94.5% reduction) compared to the original 1.94GB.

The key improvements in our logs compression technology come from:

  1. Enhanced dictionary encoding for logs — More efficient representation of repeated patterns in log data
  2. Improved bit-packing for log fields — Better utilization of available bits in integer storage
  3. Optimized timestamp encoding for logs — Further refinement of the double-delta encoding specifically for log timestamps

Performance Optimizations for Compressed Logs Queries

We're also working on several performance enhancements for querying compressed logs:

  1. Multi-threaded query execution on compressed logs — Utilizing multiple CPU cores for parallel processing
  2. Memory allocation optimizations for logs compression — Reducing allocations during query execution
  3. Improved query planning for compressed logs — Better utilization of index structures
  4. Selective decompression of logs — Only decompressing the portions of log data needed for a query

Our internal benchmarks suggest these optimizations could improve full data scan query performance on compressed logs by 5-7x, bringing execution times down to 2-3 seconds for the 10.6GB dataset.

Comparison with Alternative Logs Compression Solutions

To provide context for Logdy Pro's logs compression performance, we compared it with several alternative log management approaches:

SolutionLogs Compression EfficiencyQuery Performance on Compressed LogsOperational ComplexityResource Requirements
Logdy ProExcellent (10-15x logs compression)Good (sub-second to seconds)Low (single binary)Low (2-4GB RAM, 1 CPU)
ELK StackModerate (3-5x logs compression)Excellent (milliseconds)High (cluster management)High (16-32GB RAM per node)
Log Files + GrepPoor (1-3x with gzip)Poor (minutes for large files)Low (standard tools)Low (depends on file size)
Database StorageModerate (depends on DB)Good (seconds)Moderate (DB management)Moderate (8-16GB RAM)
Cloud Log ServicesN/A (managed service)Excellent (milliseconds)Low (managed service)N/A (managed service)

This comparison highlights Logdy Pro's unique position in the logs compression market:

  • Best-in-class logs compression efficiency — 2-5x better than alternative compression methods
  • Competitive query performance on compressed logs — Especially for index-only queries
  • Minimal operational complexity — Single binary, no cluster to manage
  • Low resource requirements — Runs on modest hardware

Conclusion: Balancing the Logs Compression Trade-offs

The benchmark results confirm Logdy Pro's ability to deliver on its promise of balancing the database trade-off triangle with superior logs compression:

  1. Exceptional logs compression efficiency — 10-15x compression ratios significantly reduce storage costs and enable longer log retention periods

  2. Acceptable query performance on compressed logs — Sub-second to seconds response times for most queries, with ongoing optimizations to improve full data scan performance

  3. Minimal operational complexity — Simple deployment, low resource requirements, and no complex cluster management

For organizations generating large volumes of log data who want to:

  • Reduce log storage costs through advanced logs compression
  • Simplify their log management infrastructure
  • Maintain control over their compressed log data
  • Achieve reasonable query performance on compressed logs

Logdy Pro offers a compelling alternative to both complex distributed systems like the ELK stack and basic approaches like grep on compressed log files.

When to Choose Logdy Pro for Logs Compression

Logdy Pro's logs compression technology is particularly well-suited for:

  • Mid-size organizations with significant log volumes but limited infrastructure resources
  • Self-hosted environments where data sovereignty is important
  • Cost-sensitive deployments where logs compression and storage efficiency are prioritized
  • Environments with modest query performance requirements where sub-second to seconds response times are acceptable
  • Organizations needing long-term log retention where advanced logs compression provides significant cost savings

Getting Started with Logdy Pro Logs Compression

To try Logdy Pro's advanced logs compression with your own log data simply contact me at: peter(at)logdy.dev or use the contact form below.

Interested in using LogdyPro? Let's get in touch!




Watch out! You're reading a series of articles

  • Logdy Pro Part I: The Problem

    The article clarifies the problems being solved by Logdy Pro and a way it does that

  • Logdy Pro Part II: Storage

    The article uncovers how Logdy Pro is able to achieve 15-20x compression rate while still maintain fast search and query capabilities

  • (currently reading) Logdy Pro Part III: Benchmark

    The article presents comprehensive benchmark results demonstrating Logdy Pro's performance metrics and storage efficiency compared to industry alternatives.