Documentation
Learn how to use learned indexes in your applications
Installation
OmenDB provides two ways to use learned indexes: as a PostgreSQL extension or as a standalone database.
Prerequisites: Rust 1.70+, PostgreSQL 12+ (for extension), Git
PostgreSQL Extension
The PostgreSQL extension adds learned index functions to your existing database.
Build and Install
Usage
Query Optimization
Best practices for maximizing performance with learned indexes.
Key Insight
Learned indexes work best with sequential data patterns like timestamps, auto-incrementing IDs, and sorted datasets.
Optimization Examples
Understanding Learned Indexes
Learned indexes replace traditional B-tree data structures with machine learning models.
How They Work
- Training Phase: Analyze your data's distribution (~100ms)
- Model Building: Create a function that maps keys to positions
- Prediction: Use the model to predict where data lives (1-2 CPU instructions)
- Refinement: Binary search within a small range (±100 positions) for exact match
Key Insight
Most real-world data follows predictable patterns. Sequential IDs, timestamps, and user IDs all have distributions that machine learning models can learn and predict efficiently.
Types of Learned Indexes
Linear Index
Simple linear regression. Fast training, good for uniformly distributed data.
RMI (Recursive Model Index)
Two-stage hierarchy. Root model selects leaf model, leaf model predicts position.
Performance Characteristics
| Operation | B-tree | Learned Index | Improvement |
|---|---|---|---|
| Point Lookup | O(log n) | O(1) + small search | 2-3x faster |
| Range Query | O(log n + k) | O(1 + k) | Up to 16x faster |
| Insert | O(log n) | O(log n)* | Similar |
*Inserts require periodic model retraining for optimal performance
Best Use Cases
Excellent For
- • Time-series data (timestamps)
- • Sequential IDs
- • Financial data
- • IoT sensor data
- • Log data
- • Read-heavy workloads
Consider Carefully
- • Completely random data
- • Very high write rates
- • Frequent data updates
- • Small datasets (<1000 records)
- • String primary keys
PostgreSQL Functions
learned_index_benchmark(num_keys)
Compare learned index performance against B-trees
- Parameters: num_keys (integer, 1-1,000,000)
- Returns: Formatted benchmark results
learned_index_version()
Get extension version information
- Returns: Version string
learned_index_info()
Learn about learned index technology
- Returns: Educational text about learned indexes
Advanced SQL Examples
Examples of using learned indexes with real-world queries.
Time-Series Queries
Best Practices
- Sequential keys - Primary keys should be sequential (timestamps, IDs)
- Range queries - Use BETWEEN for maximum speedup
- Batch inserts - Insert data in chronological order when possible
- Monitor performance - Use benchmark function to validate improvements
Need help or have questions?