bloom

bloom

bloom : bloom access method - signature file based index

Overview

ID Extension Package Version Category License Language
2990
bloom
bloom
1.0
FEAT
PostgreSQL
C
Attribute Has Binary Has Library Need Load Has DDL Relocatable Trusted
--s-d--
No
Yes
No
Yes
no
no
Relationships
See Also
hll
age
rum
pg_graphql
pg_jsonschema
jsquery
pg_hint_plan
hypopg

Packages

PG18 PG17 PG16 PG15 PG14
1.0
1.0
1.0
1.0
1.0

This is a built-in contrib extension ship with the PostgreSQL kernel

Install

Create this extension with:

CREATE EXTENSION bloom;

Usage

bloom: bloom access method - signature file based index

The bloom extension provides an index access method based on Bloom filters. A Bloom filter is a space-efficient data structure that tests whether an element is a member of a set, with possible false positives but no false negatives.

Bloom indexes are particularly useful for tables with many columns where queries test arbitrary combinations of columns. A single bloom index can replace multiple btree indexes while using significantly less space.

Create Bloom Index

CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);

With custom parameters:

CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3)
       WITH (length=80, col1=2, col2=2, col3=4);

Parameters

Parameter Default Max Description
length 80 4096 Length of each signature in bits (rounded to nearest 16)
col1 - col32 2 4095 Number of bits generated for each index column

Operator Class Support

Bloom indexes only support the equality operator (=). Built-in operator classes exist for int4 and text. You can define custom operator classes for other types that have a hash function:

CREATE OPERATOR CLASS text_ops
DEFAULT FOR TYPE text USING bloom AS
    OPERATOR    1   =(text, text),
    FUNCTION    1   hashtext(text);

Example

-- Create a table with many columns
CREATE TABLE tbloom AS
   SELECT
     (random() * 1000000)::int as i1,
     (random() * 1000000)::int as i2,
     (random() * 1000000)::int as i3,
     (random() * 1000000)::int as i4,
     (random() * 1000000)::int as i5,
     (random() * 1000000)::int as i6
   FROM generate_series(1, 10000000);

-- A single bloom index covers all column combinations
CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);

-- Queries on any subset of columns can use the index
SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;

Limitations

  • Only equality (=) queries are supported (no range queries)
  • Does not support UNIQUE indexes
  • Does not support searching for NULL values
  • Results require recheck against heap due to false positives
Last updated on