Documentation - OmenDB

Installation

OmenDB provides two ways to use learned indexes: as a PostgreSQL extension or as a standalone database.

Prerequisites: Rust 1.70+, PostgreSQL 12+ (for extension), Git

PostgreSQL Extension

The PostgreSQL extension adds learned index functions to your existing database.

Build and Install

# Install pgrx (PostgreSQL Rust extension framework)

cargo install cargo-pgrx

cargo pgrx init

# Clone and build OmenDB extension

git clone https://github.com/omendb/pg-learned

cd pg-learned

cargo pgrx install

# Enable in PostgreSQL

CREATE EXTENSION omendb;

Usage

-- Check version

SELECT learned_index_version();

-- Run benchmark with 10,000 keys

SELECT learned_index_benchmark(10000);

-- Learn about the technology

SELECT learned_index_info();

Query Optimization

Best practices for maximizing performance with learned indexes.

Key Insight

Learned indexes work best with sequential data patterns like timestamps, auto-incrementing IDs, and sorted datasets.

Optimization Examples

-- Time-series data (ideal for learned indexes)

CREATE TABLE metrics (

timestamp BIGINT PRIMARY KEY,

value DOUBLE PRECISION

);

-- Sequential IDs (also ideal)

CREATE TABLE orders (

order_id SERIAL PRIMARY KEY,

customer_data JSONB

);

-- Test performance improvement

SELECT learned_index_benchmark(100000);

Understanding Learned Indexes

Learned indexes replace traditional B-tree data structures with machine learning models.

How They Work

Training Phase: Analyze your data's distribution (~100ms)
Model Building: Create a function that maps keys to positions
Prediction: Use the model to predict where data lives (1-2 CPU instructions)
Refinement: Binary search within a small range (±100 positions) for exact match

Key Insight

Most real-world data follows predictable patterns. Sequential IDs, timestamps, and user IDs all have distributions that machine learning models can learn and predict efficiently.

Types of Learned Indexes

Linear Index

Simple linear regression. Fast training, good for uniformly distributed data.

RMI (Recursive Model Index)

Two-stage hierarchy. Root model selects leaf model, leaf model predicts position.

Performance Characteristics

Operation	B-tree	Learned Index	Improvement
Point Lookup	O(log n)	O(1) + small search	2-3x faster
Range Query	O(log n + k)	O(1 + k)	Up to 16x faster
Insert	O(log n)	O(log n)*	Similar

*Inserts require periodic model retraining for optimal performance

Best Use Cases

Excellent For

• Time-series data (timestamps)
• Sequential IDs
• Financial data
• IoT sensor data
• Log data
• Read-heavy workloads

Consider Carefully

• Completely random data
• Very high write rates
• Frequent data updates
• Small datasets (<1000 records)
• String primary keys

PostgreSQL Functions

learned_index_benchmark(num_keys)

Compare learned index performance against B-trees

SELECT learned_index_benchmark(50000);

Parameters: num_keys (integer, 1-1,000,000)
Returns: Formatted benchmark results

learned_index_version()

Get extension version information

SELECT learned_index_version();

Returns: Version string

learned_index_info()

Learn about learned index technology

SELECT learned_index_info();

Returns: Educational text about learned indexes

Advanced SQL Examples

Examples of using learned indexes with real-world queries.

Time-Series Queries

-- Timestamp-based data benefits most from learned indexes

CREATE TABLE metrics (

timestamp BIGINT PRIMARY KEY,

value DOUBLE PRECISION

);

-- Range queries are up to 16x faster

SELECT AVG(value) FROM metrics WHERE timestamp BETWEEN ? AND ?;

Best Practices

Sequential keys - Primary keys should be sequential (timestamps, IDs)
Range queries - Use BETWEEN for maximum speedup
Batch inserts - Insert data in chronological order when possible
Monitor performance - Use benchmark function to validate improvements

Need help or have questions?

Open an Issue → Contact Support →