Documentation

Learn how to use learned indexes in your applications

Installation

OmenDB provides two ways to use learned indexes: as a PostgreSQL extension or as a standalone database.

Prerequisites: Rust 1.70+, PostgreSQL 12+ (for extension), Git

PostgreSQL Extension

The PostgreSQL extension adds learned index functions to your existing database.

Build and Install

# Install pgrx (PostgreSQL Rust extension framework)
cargo install cargo-pgrx
cargo pgrx init
# Clone and build OmenDB extension
git clone https://github.com/omendb/pg-learned
cd pg-learned
cargo pgrx install
# Enable in PostgreSQL
CREATE EXTENSION omendb;

Usage

-- Check version
SELECT learned_index_version();
-- Run benchmark with 10,000 keys
SELECT learned_index_benchmark(10000);
-- Learn about the technology
SELECT learned_index_info();

Query Optimization

Best practices for maximizing performance with learned indexes.

Key Insight

Learned indexes work best with sequential data patterns like timestamps, auto-incrementing IDs, and sorted datasets.

Optimization Examples

-- Time-series data (ideal for learned indexes)
CREATE TABLE metrics (
timestamp BIGINT PRIMARY KEY,
value DOUBLE PRECISION
);
-- Sequential IDs (also ideal)
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
customer_data JSONB
);
-- Test performance improvement
SELECT learned_index_benchmark(100000);

Understanding Learned Indexes

Learned indexes replace traditional B-tree data structures with machine learning models.

How They Work

  1. Training Phase: Analyze your data's distribution (~100ms)
  2. Model Building: Create a function that maps keys to positions
  3. Prediction: Use the model to predict where data lives (1-2 CPU instructions)
  4. Refinement: Binary search within a small range (±100 positions) for exact match

Key Insight

Most real-world data follows predictable patterns. Sequential IDs, timestamps, and user IDs all have distributions that machine learning models can learn and predict efficiently.

Types of Learned Indexes

Linear Index

Simple linear regression. Fast training, good for uniformly distributed data.

RMI (Recursive Model Index)

Two-stage hierarchy. Root model selects leaf model, leaf model predicts position.

Performance Characteristics

Operation B-tree Learned Index Improvement
Point Lookup O(log n) O(1) + small search 2-3x faster
Range Query O(log n + k) O(1 + k) Up to 16x faster
Insert O(log n) O(log n)* Similar

*Inserts require periodic model retraining for optimal performance

Best Use Cases

Excellent For

  • • Time-series data (timestamps)
  • • Sequential IDs
  • • Financial data
  • • IoT sensor data
  • • Log data
  • • Read-heavy workloads

Consider Carefully

  • • Completely random data
  • • Very high write rates
  • • Frequent data updates
  • • Small datasets (<1000 records)
  • • String primary keys

PostgreSQL Functions

learned_index_benchmark(num_keys)

Compare learned index performance against B-trees

SELECT learned_index_benchmark(50000);
  • Parameters: num_keys (integer, 1-1,000,000)
  • Returns: Formatted benchmark results

learned_index_version()

Get extension version information

SELECT learned_index_version();
  • Returns: Version string

learned_index_info()

Learn about learned index technology

SELECT learned_index_info();
  • Returns: Educational text about learned indexes

Advanced SQL Examples

Examples of using learned indexes with real-world queries.

Time-Series Queries

-- Timestamp-based data benefits most from learned indexes
CREATE TABLE metrics (
timestamp BIGINT PRIMARY KEY,
value DOUBLE PRECISION
);
-- Range queries are up to 16x faster
SELECT AVG(value) FROM metrics WHERE timestamp BETWEEN ? AND ?;

Best Practices

  • Sequential keys - Primary keys should be sequential (timestamps, IDs)
  • Range queries - Use BETWEEN for maximum speedup
  • Batch inserts - Insert data in chronological order when possible
  • Monitor performance - Use benchmark function to validate improvements

Need help or have questions?