pg_bestmatch

pg_bestmatch

pg_bestmatch : Generate BM25 sparse vector inside PostgreSQL

Overview

ID Extension Package Version Category License Language
2140
pg_bestmatch
pg_bestmatch
0.0.2
FTS
Apache-2.0
Rust
Attribute Has Binary Has Library Need Load Has DDL Relocatable Trusted
--sLd--
No
Yes
Yes
Yes
no
no
Relationships
See Also
vector
pg_search
vchord_bm25
vchord
vectorscale
zhparser
pg_tokenizer
vectorize

manual updated pgrx by Vonng

Packages

Type Repo Version PG Major Compatibility Package Pattern Dependencies
EXT
PIGSTY
0.0.2
18
17
16
15
14
13
pg_bestmatch -
RPM
PIGSTY
0.0.2
18
17
16
15
14
13
pg_bestmatch_$v -
DEB
PIGSTY
0.0.2
18
17
16
15
14
13
postgresql-$v-pg-bestmatch -
Linux / PG PG18 PG17 PG16 PG15 PG14 PG13
el8.x86_64
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
el8.aarch64
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
el9.x86_64
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
el9.aarch64
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
el10.x86_64
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
el10.aarch64
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
d12.x86_64
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
d12.aarch64
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
d13.x86_64
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
d13.aarch64
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
u22.x86_64
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
u22.aarch64
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
u24.x86_64
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
u24.aarch64
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
PIGSTY 0.0.2
Package Version OS ORG SIZE File URL
pg_bestmatch_18 0.0.2 el8.x86_64 pigsty 7.2 MiB pg_bestmatch_18-0.0.2-1PIGSTY.el8.x86_64.rpm
pg_bestmatch_18 0.0.2 el8.aarch64 pigsty 7.0 MiB pg_bestmatch_18-0.0.2-1PIGSTY.el8.aarch64.rpm
pg_bestmatch_18 0.0.2 el9.x86_64 pigsty 7.0 MiB pg_bestmatch_18-0.0.2-1PIGSTY.el9.x86_64.rpm
pg_bestmatch_18 0.0.2 el9.aarch64 pigsty 6.9 MiB pg_bestmatch_18-0.0.2-1PIGSTY.el9.aarch64.rpm
pg_bestmatch_18 0.0.2 el10.x86_64 pigsty 6.9 MiB pg_bestmatch_18-0.0.2-1PIGSTY.el10.x86_64.rpm
pg_bestmatch_18 0.0.2 el10.aarch64 pigsty 7.0 MiB pg_bestmatch_18-0.0.2-1PIGSTY.el10.aarch64.rpm
postgresql-18-pg-bestmatch 0.0.2 d12.x86_64 pigsty 6.1 MiB postgresql-18-pg-bestmatch_0.0.2-2PIGSTY~bookworm_amd64.deb
postgresql-18-pg-bestmatch 0.0.2 d12.aarch64 pigsty 5.8 MiB postgresql-18-pg-bestmatch_0.0.2-2PIGSTY~bookworm_arm64.deb
postgresql-18-pg-bestmatch 0.0.2 d13.x86_64 pigsty 6.1 MiB postgresql-18-pg-bestmatch_0.0.2-2PIGSTY~trixie_amd64.deb
postgresql-18-pg-bestmatch 0.0.2 d13.aarch64 pigsty 5.8 MiB postgresql-18-pg-bestmatch_0.0.2-2PIGSTY~trixie_arm64.deb
postgresql-18-pg-bestmatch 0.0.2 u22.x86_64 pigsty 6.7 MiB postgresql-18-pg-bestmatch_0.0.2-2PIGSTY~jammy_amd64.deb
postgresql-18-pg-bestmatch 0.0.2 u22.aarch64 pigsty 6.5 MiB postgresql-18-pg-bestmatch_0.0.2-2PIGSTY~jammy_arm64.deb
postgresql-18-pg-bestmatch 0.0.2 u24.x86_64 pigsty 6.7 MiB postgresql-18-pg-bestmatch_0.0.2-2PIGSTY~noble_amd64.deb
postgresql-18-pg-bestmatch 0.0.2 u24.aarch64 pigsty 6.5 MiB postgresql-18-pg-bestmatch_0.0.2-2PIGSTY~noble_arm64.deb
Package Version OS ORG SIZE File URL
pg_bestmatch_17 0.0.2 el8.x86_64 pigsty 7.2 MiB pg_bestmatch_17-0.0.2-1PIGSTY.el8.x86_64.rpm
pg_bestmatch_17 0.0.1 el8.x86_64 pigsty 6.7 MiB pg_bestmatch_17-0.0.1-1PIGSTY.el8.x86_64.rpm
pg_bestmatch_17 0.0.2 el8.aarch64 pigsty 7.0 MiB pg_bestmatch_17-0.0.2-1PIGSTY.el8.aarch64.rpm
pg_bestmatch_17 0.0.1 el8.aarch64 pigsty 6.5 MiB pg_bestmatch_17-0.0.1-1PIGSTY.el8.aarch64.rpm
pg_bestmatch_17 0.0.2 el9.x86_64 pigsty 7.0 MiB pg_bestmatch_17-0.0.2-1PIGSTY.el9.x86_64.rpm
pg_bestmatch_17 0.0.1 el9.x86_64 pigsty 6.4 MiB pg_bestmatch_17-0.0.1-1PIGSTY.el9.x86_64.rpm
pg_bestmatch_17 0.0.2 el9.aarch64 pigsty 6.9 MiB pg_bestmatch_17-0.0.2-1PIGSTY.el9.aarch64.rpm
pg_bestmatch_17 0.0.2 el10.x86_64 pigsty 6.9 MiB pg_bestmatch_17-0.0.2-1PIGSTY.el10.x86_64.rpm
pg_bestmatch_17 0.0.2 el10.aarch64 pigsty 7.0 MiB pg_bestmatch_17-0.0.2-1PIGSTY.el10.aarch64.rpm
postgresql-17-pg-bestmatch 0.0.2 d12.x86_64 pigsty 6.1 MiB postgresql-17-pg-bestmatch_0.0.2-2PIGSTY~bookworm_amd64.deb
postgresql-17-pg-bestmatch 0.0.2 d12.aarch64 pigsty 5.8 MiB postgresql-17-pg-bestmatch_0.0.2-2PIGSTY~bookworm_arm64.deb
postgresql-17-pg-bestmatch 0.0.2 d13.x86_64 pigsty 6.1 MiB postgresql-17-pg-bestmatch_0.0.2-2PIGSTY~trixie_amd64.deb
postgresql-17-pg-bestmatch 0.0.2 d13.aarch64 pigsty 5.9 MiB postgresql-17-pg-bestmatch_0.0.2-2PIGSTY~trixie_arm64.deb
postgresql-17-pg-bestmatch 0.0.2 u22.x86_64 pigsty 6.7 MiB postgresql-17-pg-bestmatch_0.0.2-2PIGSTY~jammy_amd64.deb
postgresql-17-pg-bestmatch 0.0.2 u22.aarch64 pigsty 6.6 MiB postgresql-17-pg-bestmatch_0.0.2-2PIGSTY~jammy_arm64.deb
postgresql-17-pg-bestmatch 0.0.2 u24.x86_64 pigsty 6.7 MiB postgresql-17-pg-bestmatch_0.0.2-2PIGSTY~noble_amd64.deb
postgresql-17-pg-bestmatch 0.0.2 u24.aarch64 pigsty 6.5 MiB postgresql-17-pg-bestmatch_0.0.2-2PIGSTY~noble_arm64.deb
Package Version OS ORG SIZE File URL
pg_bestmatch_16 0.0.2 el8.x86_64 pigsty 7.2 MiB pg_bestmatch_16-0.0.2-1PIGSTY.el8.x86_64.rpm
pg_bestmatch_16 0.0.1 el8.x86_64 pigsty 6.7 MiB pg_bestmatch_16-0.0.1-1PIGSTY.el8.x86_64.rpm
pg_bestmatch_16 0.0.2 el8.aarch64 pigsty 7.0 MiB pg_bestmatch_16-0.0.2-1PIGSTY.el8.aarch64.rpm
pg_bestmatch_16 0.0.1 el8.aarch64 pigsty 6.5 MiB pg_bestmatch_16-0.0.1-1PIGSTY.el8.aarch64.rpm
pg_bestmatch_16 0.0.2 el9.x86_64 pigsty 7.0 MiB pg_bestmatch_16-0.0.2-1PIGSTY.el9.x86_64.rpm
pg_bestmatch_16 0.0.1 el9.x86_64 pigsty 6.4 MiB pg_bestmatch_16-0.0.1-1PIGSTY.el9.x86_64.rpm
pg_bestmatch_16 0.0.2 el9.aarch64 pigsty 6.9 MiB pg_bestmatch_16-0.0.2-1PIGSTY.el9.aarch64.rpm
pg_bestmatch_16 0.0.2 el10.x86_64 pigsty 6.9 MiB pg_bestmatch_16-0.0.2-1PIGSTY.el10.x86_64.rpm
pg_bestmatch_16 0.0.2 el10.aarch64 pigsty 7.0 MiB pg_bestmatch_16-0.0.2-1PIGSTY.el10.aarch64.rpm
postgresql-16-pg-bestmatch 0.0.2 d12.x86_64 pigsty 6.1 MiB postgresql-16-pg-bestmatch_0.0.2-2PIGSTY~bookworm_amd64.deb
postgresql-16-pg-bestmatch 0.0.2 d12.aarch64 pigsty 5.8 MiB postgresql-16-pg-bestmatch_0.0.2-2PIGSTY~bookworm_arm64.deb
postgresql-16-pg-bestmatch 0.0.2 d13.x86_64 pigsty 6.1 MiB postgresql-16-pg-bestmatch_0.0.2-2PIGSTY~trixie_amd64.deb
postgresql-16-pg-bestmatch 0.0.2 d13.aarch64 pigsty 5.9 MiB postgresql-16-pg-bestmatch_0.0.2-2PIGSTY~trixie_arm64.deb
postgresql-16-pg-bestmatch 0.0.2 u22.x86_64 pigsty 6.7 MiB postgresql-16-pg-bestmatch_0.0.2-2PIGSTY~jammy_amd64.deb
postgresql-16-pg-bestmatch 0.0.2 u22.aarch64 pigsty 6.6 MiB postgresql-16-pg-bestmatch_0.0.2-2PIGSTY~jammy_arm64.deb
postgresql-16-pg-bestmatch 0.0.2 u24.x86_64 pigsty 6.7 MiB postgresql-16-pg-bestmatch_0.0.2-2PIGSTY~noble_amd64.deb
postgresql-16-pg-bestmatch 0.0.2 u24.aarch64 pigsty 6.5 MiB postgresql-16-pg-bestmatch_0.0.2-2PIGSTY~noble_arm64.deb
Package Version OS ORG SIZE File URL
pg_bestmatch_15 0.0.2 el8.x86_64 pigsty 7.2 MiB pg_bestmatch_15-0.0.2-1PIGSTY.el8.x86_64.rpm
pg_bestmatch_15 0.0.1 el8.x86_64 pigsty 6.7 MiB pg_bestmatch_15-0.0.1-1PIGSTY.el8.x86_64.rpm
pg_bestmatch_15 0.0.2 el8.aarch64 pigsty 7.0 MiB pg_bestmatch_15-0.0.2-1PIGSTY.el8.aarch64.rpm
pg_bestmatch_15 0.0.1 el8.aarch64 pigsty 6.5 MiB pg_bestmatch_15-0.0.1-1PIGSTY.el8.aarch64.rpm
pg_bestmatch_15 0.0.2 el9.x86_64 pigsty 7.0 MiB pg_bestmatch_15-0.0.2-1PIGSTY.el9.x86_64.rpm
pg_bestmatch_15 0.0.1 el9.x86_64 pigsty 6.4 MiB pg_bestmatch_15-0.0.1-1PIGSTY.el9.x86_64.rpm
pg_bestmatch_15 0.0.2 el9.aarch64 pigsty 6.9 MiB pg_bestmatch_15-0.0.2-1PIGSTY.el9.aarch64.rpm
pg_bestmatch_15 0.0.2 el10.x86_64 pigsty 6.9 MiB pg_bestmatch_15-0.0.2-1PIGSTY.el10.x86_64.rpm
pg_bestmatch_15 0.0.2 el10.aarch64 pigsty 7.0 MiB pg_bestmatch_15-0.0.2-1PIGSTY.el10.aarch64.rpm
postgresql-15-pg-bestmatch 0.0.2 d12.x86_64 pigsty 6.1 MiB postgresql-15-pg-bestmatch_0.0.2-2PIGSTY~bookworm_amd64.deb
postgresql-15-pg-bestmatch 0.0.2 d12.aarch64 pigsty 5.9 MiB postgresql-15-pg-bestmatch_0.0.2-2PIGSTY~bookworm_arm64.deb
postgresql-15-pg-bestmatch 0.0.2 d13.x86_64 pigsty 6.1 MiB postgresql-15-pg-bestmatch_0.0.2-2PIGSTY~trixie_amd64.deb
postgresql-15-pg-bestmatch 0.0.2 d13.aarch64 pigsty 5.8 MiB postgresql-15-pg-bestmatch_0.0.2-2PIGSTY~trixie_arm64.deb
postgresql-15-pg-bestmatch 0.0.2 u22.x86_64 pigsty 6.7 MiB postgresql-15-pg-bestmatch_0.0.2-2PIGSTY~jammy_amd64.deb
postgresql-15-pg-bestmatch 0.0.2 u22.aarch64 pigsty 6.5 MiB postgresql-15-pg-bestmatch_0.0.2-2PIGSTY~jammy_arm64.deb
postgresql-15-pg-bestmatch 0.0.2 u24.x86_64 pigsty 6.7 MiB postgresql-15-pg-bestmatch_0.0.2-2PIGSTY~noble_amd64.deb
postgresql-15-pg-bestmatch 0.0.2 u24.aarch64 pigsty 6.5 MiB postgresql-15-pg-bestmatch_0.0.2-2PIGSTY~noble_arm64.deb
Package Version OS ORG SIZE File URL
pg_bestmatch_14 0.0.2 el8.x86_64 pigsty 7.2 MiB pg_bestmatch_14-0.0.2-1PIGSTY.el8.x86_64.rpm
pg_bestmatch_14 0.0.1 el8.x86_64 pigsty 6.7 MiB pg_bestmatch_14-0.0.1-1PIGSTY.el8.x86_64.rpm
pg_bestmatch_14 0.0.2 el8.aarch64 pigsty 7.0 MiB pg_bestmatch_14-0.0.2-1PIGSTY.el8.aarch64.rpm
pg_bestmatch_14 0.0.1 el8.aarch64 pigsty 6.5 MiB pg_bestmatch_14-0.0.1-1PIGSTY.el8.aarch64.rpm
pg_bestmatch_14 0.0.2 el9.x86_64 pigsty 6.9 MiB pg_bestmatch_14-0.0.2-1PIGSTY.el9.x86_64.rpm
pg_bestmatch_14 0.0.1 el9.x86_64 pigsty 6.4 MiB pg_bestmatch_14-0.0.1-1PIGSTY.el9.x86_64.rpm
pg_bestmatch_14 0.0.2 el9.aarch64 pigsty 6.9 MiB pg_bestmatch_14-0.0.2-1PIGSTY.el9.aarch64.rpm
pg_bestmatch_14 0.0.2 el10.x86_64 pigsty 6.9 MiB pg_bestmatch_14-0.0.2-1PIGSTY.el10.x86_64.rpm
pg_bestmatch_14 0.0.2 el10.aarch64 pigsty 7.0 MiB pg_bestmatch_14-0.0.2-1PIGSTY.el10.aarch64.rpm
postgresql-14-pg-bestmatch 0.0.2 d12.x86_64 pigsty 6.1 MiB postgresql-14-pg-bestmatch_0.0.2-2PIGSTY~bookworm_amd64.deb
postgresql-14-pg-bestmatch 0.0.2 d12.aarch64 pigsty 5.9 MiB postgresql-14-pg-bestmatch_0.0.2-2PIGSTY~bookworm_arm64.deb
postgresql-14-pg-bestmatch 0.0.2 d13.x86_64 pigsty 6.1 MiB postgresql-14-pg-bestmatch_0.0.2-2PIGSTY~trixie_amd64.deb
postgresql-14-pg-bestmatch 0.0.2 d13.aarch64 pigsty 5.8 MiB postgresql-14-pg-bestmatch_0.0.2-2PIGSTY~trixie_arm64.deb
postgresql-14-pg-bestmatch 0.0.2 u22.x86_64 pigsty 6.7 MiB postgresql-14-pg-bestmatch_0.0.2-2PIGSTY~jammy_amd64.deb
postgresql-14-pg-bestmatch 0.0.2 u22.aarch64 pigsty 6.6 MiB postgresql-14-pg-bestmatch_0.0.2-2PIGSTY~jammy_arm64.deb
postgresql-14-pg-bestmatch 0.0.2 u24.x86_64 pigsty 6.7 MiB postgresql-14-pg-bestmatch_0.0.2-2PIGSTY~noble_amd64.deb
postgresql-14-pg-bestmatch 0.0.2 u24.aarch64 pigsty 6.5 MiB postgresql-14-pg-bestmatch_0.0.2-2PIGSTY~noble_arm64.deb
Package Version OS ORG SIZE File URL
pg_bestmatch_13 0.0.2 el8.x86_64 pigsty 7.2 MiB pg_bestmatch_13-0.0.2-1PIGSTY.el8.x86_64.rpm
pg_bestmatch_13 0.0.1 el8.x86_64 pigsty 6.7 MiB pg_bestmatch_13-0.0.1-1PIGSTY.el8.x86_64.rpm
pg_bestmatch_13 0.0.2 el8.aarch64 pigsty 7.0 MiB pg_bestmatch_13-0.0.2-1PIGSTY.el8.aarch64.rpm
pg_bestmatch_13 0.0.1 el8.aarch64 pigsty 6.5 MiB pg_bestmatch_13-0.0.1-1PIGSTY.el8.aarch64.rpm
pg_bestmatch_13 0.0.2 el9.x86_64 pigsty 6.9 MiB pg_bestmatch_13-0.0.2-1PIGSTY.el9.x86_64.rpm
pg_bestmatch_13 0.0.1 el9.x86_64 pigsty 6.4 MiB pg_bestmatch_13-0.0.1-1PIGSTY.el9.x86_64.rpm
pg_bestmatch_13 0.0.2 el9.aarch64 pigsty 6.9 MiB pg_bestmatch_13-0.0.2-1PIGSTY.el9.aarch64.rpm
pg_bestmatch_13 0.0.2 el10.x86_64 pigsty 6.9 MiB pg_bestmatch_13-0.0.2-1PIGSTY.el10.x86_64.rpm
pg_bestmatch_13 0.0.2 el10.aarch64 pigsty 7.0 MiB pg_bestmatch_13-0.0.2-1PIGSTY.el10.aarch64.rpm
postgresql-13-pg-bestmatch 0.0.2 d12.x86_64 pigsty 6.1 MiB postgresql-13-pg-bestmatch_0.0.2-2PIGSTY~bookworm_amd64.deb
postgresql-13-pg-bestmatch 0.0.2 d12.aarch64 pigsty 5.8 MiB postgresql-13-pg-bestmatch_0.0.2-2PIGSTY~bookworm_arm64.deb
postgresql-13-pg-bestmatch 0.0.2 d13.x86_64 pigsty 6.1 MiB postgresql-13-pg-bestmatch_0.0.2-2PIGSTY~trixie_amd64.deb
postgresql-13-pg-bestmatch 0.0.2 d13.aarch64 pigsty 5.8 MiB postgresql-13-pg-bestmatch_0.0.2-2PIGSTY~trixie_arm64.deb
postgresql-13-pg-bestmatch 0.0.2 u22.x86_64 pigsty 6.7 MiB postgresql-13-pg-bestmatch_0.0.2-2PIGSTY~jammy_amd64.deb
postgresql-13-pg-bestmatch 0.0.2 u22.aarch64 pigsty 6.6 MiB postgresql-13-pg-bestmatch_0.0.2-2PIGSTY~jammy_arm64.deb
postgresql-13-pg-bestmatch 0.0.2 u24.x86_64 pigsty 6.7 MiB postgresql-13-pg-bestmatch_0.0.2-2PIGSTY~noble_amd64.deb
postgresql-13-pg-bestmatch 0.0.2 u24.aarch64 pigsty 6.5 MiB postgresql-13-pg-bestmatch_0.0.2-2PIGSTY~noble_arm64.deb

Source

pig build pkg pg_bestmatch;		# build rpm / deb with pig

Install

Make sure PGDG and PIGSTY repo available:

pig repo add pgdg pigsty -u   # add both repo and update cache

Install this extension with pig:

pig install pg_bestmatch;		# install via package name, for the active PG version

pig install pg_bestmatch -v 18;   # install for PG 18
pig install pg_bestmatch -v 17;   # install for PG 17
pig install pg_bestmatch -v 16;   # install for PG 16
pig install pg_bestmatch -v 15;   # install for PG 15
pig install pg_bestmatch -v 14;   # install for PG 14
pig install pg_bestmatch -v 13;   # install for PG 13

Config this extension to shared_preload_libraries:

shared_preload_libraries = 'pg_bestmatch';

Create this extension with:

CREATE EXTENSION pg_bestmatch;

Usage

How does it work?

  • Create an BM25 statistics based on your document set by bm25_create(table_name, column_name, statistic_name);. It will create a materilized view to record the stats.
  • Generate document sparse vector by bm25_document_to_svector(statistic_name, passage)
  • For query, generate query sparse vector bm25_query_to_svector(statistic_name, query)
  • Calculate the score by dot product between the query sparse vector and the document sparse vector
  • Currently we use huggingface tokenizer with bert-base-uncased vocabulary set to tokenize words. Might support more configuration on tokenizer in the future.

Install

CREATE EXTENSION pg_bestmatch;
SET search_path TO public, bm_catalog;

Example

Here is an example workflow demonstrating the usage of this extension with the example of Stanford LoCo benchmark.

  1. Load the dataset. Here is a script for you if you want to experience pg_bestmatch with the dataset.
wget https://huggingface.co/api/datasets/hazyresearch/LoCoV1-Documents/parquet/default/test/0.parquet -O documents.parquet
wget https://huggingface.co/api/datasets/hazyresearch/LoCoV1-Queries/parquet/default/test/0.parquet -O queries.parquet
import pandas as pd
from sqlalchemy import create_engine
import numpy as np
from psycopg2.extensions import register_adapter, AsIs

def adapter_numpy_float64(numpy_float64):
    return AsIs(numpy_float64)

def adapter_numpy_int64(numpy_int64):
    return AsIs(numpy_int64)

def adapter_numpy_float32(numpy_float32):
    return AsIs(numpy_float32)

def adapter_numpy_int32(numpy_int32):
    return AsIs(numpy_int32)

def adapter_numpy_array(numpy_array):
    return AsIs(tuple(numpy_array))

register_adapter(np.float64, adapter_numpy_float64)
register_adapter(np.int64, adapter_numpy_int64)
register_adapter(np.float32, adapter_numpy_float32)
register_adapter(np.int32, adapter_numpy_int32)
register_adapter(np.ndarray, adapter_numpy_array)

db_url = "postgresql://localhost:5432/pg_bestmatch_test"
engine = create_engine(db_url)

def load_documents():
    df = pd.read_parquet("documents.parquet")
    df.to_sql("documents", engine, if_exists='replace', index=False)

def load_queries():
    df = pd.read_parquet("queries.parquet")
    df['answer_pids'] = df['answer_pids'].apply(lambda x: str(x[0]))    
    df.to_sql("queries", engine, if_exists='replace', index=False)

load_documents()
load_queries()
  1. Create BM25 statistics for the documents table.
SELECT bm25_create('documents', 'passage', 'documents_passage_bm25', 0.75, 1.2);
  1. Add an embedding column to the documents and queries tables and update the embeddings for documents and queries.
ALTER TABLE documents ADD COLUMN embedding svector; -- for pgvecto.rs users
ALTER TABLE documents ADD COLUMN embedding sparsevec; -- for pgvector users

UPDATE documents SET embedding = bm25_document_to_svector('documents_passage_bm25', passage)::svector; -- for pgvecto.rs users
UPDATE documents SET embedding = bm25_document_to_svector('documents_passage_bm25', passage, 'pgvector')::sparsevec; -- for pgvector users
  1. (Optional) Create a vector index on the sparse vector column.
CREATE INDEX ON documents USING vectors (embedding svector_dot_ops); -- for pgvecto.rs users
CREATE INDEX ON documents USING ivfflat (embedding sparsevec_ip_ops); -- for pgvector users
  1. Perform a vector search to find the most relevant documents for each query.
ALTER TABLE queries ADD COLUMN embedding svector; -- for pgvecto.rs users
ALTER TABLE queries ADD COLUMN embedding sparsevec; -- for pgvector users

UPDATE queries SET embedding = bm25_query_to_svector('documents_passage_bm25', query)::svector; -- for pgvecto.rs users
UPDATE queries SET embedding = bm25_query_to_svector('documents_passage_bm25', query, 'pgvector')::sparsevec; -- for pgvector users

SELECT sum((array[answer_pids] = array(SELECT pid FROM documents WHERE queries.dataset = documents.dataset ORDER BY queries.embedding <#> documents.embedding LIMIT 1))::int) FROM queries;

This workflow showcases how to leverage BM25 text queries and vector search in PostgreSQL using this extension. The Top 1 recall of BM25 on this dataset is 0.77. If you reproduce the result, your operations are correct.


Comparison with pg_search

  • pg_bestmatch.rs only provides methods for generating sparse vectors and does not support index-based search (which can be achieved by pgvecto.rs or pgvector).
  • pg_search performs BM25 retrieval via the external tantivy engine, which may have limitations when combined with transactions, filters, or JOIN operations. Since pg_bestmatch.rs is entirely native to Postgres, it offers full compatibility with these operations inside postgres.

Reference

  • tokenize
    • Description: Tokenizes an input string into individual tokens.
    • Example:
      SELECT tokenize('i have an apple'); -- result: {i,have,an,apple}
  • bm25_create
    • Description: Creates BM25 statistics for a specified table and column.
    • Usage:
      SELECT bm25_create('documents', 'passage', 'documents_passage_bm25');
    • Parameters:
      • table_name: Name of the table.
      • column_name: Name of the column.
      • stat_name: Name of the BM25 statistics.
      • b: BM25 parameter (default 0.75).
      • k: BM25 parameter (default 1.2).
  • bm25_refresh
    • Description: Updates the BM25 statistics to reflect any changes in the underlying data.
    • Usage:
      SELECT bm25_refresh('documents_passage_bm25');
    • Parameters:
      • stat_name: Name of the BM25 statistics to update.
  • bm25_drop
    • Description: Deletes the BM25 statistics for a specified table and column.
    • Usage:
      SELECT bm25_drop('documents_passage_bm25');
    • Parameters:
      • stat_name: Name of the BM25 statistics to delete.
  • bm25_document_to_svector
    • Description: Converts document text into a sparse vector representation.
    • Usage:
      SELECT bm25_document_to_svector('documents_passage_bm25', 'document_text');
    • Parameters:
      • stat_name: Name of the BM25 statistics.
      • document_text: The text of the document.
      • style: Emits pgvecto.rs-style sparse vector or pgvector-style sparse vector.
  • bm25_query_to_svector
    • Description: Converts query text into a sparse vector representation.
    • Usage:
      SELECT bm25_query_to_svector('documents_passage_bm25', 'We begin, as always, with the text.');
    • Parameters:
      • stat_name: Name of the BM25 statistics.
      • query_text: The text of the query.
      • style: Emits pgvecto.rs-style sparse vector or pgvector-style sparse vector.
Last updated on