pg_tokenizer

Extensions

pg_tokenizer

pg_tokenizer : Tokenizers for full-text search

Overview

ID	Extension	Package	Version	Category	License	Language
2160	pg_tokenizer	pg_tokenizer	`0.1.1`	FTS	Apache-2.0	Rust

Attribute	Has Binary	Has Library	Need Load	Has DDL	Relocatable	Trusted
--sLd--	No	Yes	Yes	Yes	no	no

Relationships
Schemas	`tokenizer_catalog`
See Also	pg_search pgroonga pg_bigm zhparser pgroonga_database pg_bestmatch vchord_bm25 pg_trgm

PG18 fix by Vonng

Packages

Type	Repo	Version	PG Major Compatibility	Package Pattern	Dependencies
EXT	PIGSTY	`0.1.1`	18 17 16 15 14	`pg_tokenizer`	-
RPM	PIGSTY	`0.1.1`	18 17 16 15 14	`pg_tokenizer_$v`	-
DEB	PIGSTY	`0.1.1`	18 17 16 15 14	`postgresql-$v-pg-tokenizer`	-

Linux / PG	PG18	PG17	PG16	PG15	PG14
el8.x86_64	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1
el8.aarch64	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1
el9.x86_64	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1
el9.aarch64	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1
el10.x86_64	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1
el10.aarch64	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1
d12.x86_64	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1
d12.aarch64	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1
d13.x86_64	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1
d13.aarch64	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1
u22.x86_64	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1
u22.aarch64	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1
u24.x86_64	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1
u24.aarch64	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1	PIGSTY 0.1.1

Package	Version	OS	ORG	SIZE	File URL
`pg_tokenizer_18`	`0.1.1`	el8.x86_64	pigsty	11.7 MiB	pg_tokenizer_18-0.1.1-1PIGSTY.el8.x86_64.rpm
`pg_tokenizer_18`	`0.1.1`	el8.aarch64	pigsty	11.5 MiB	pg_tokenizer_18-0.1.1-1PIGSTY.el8.aarch64.rpm
`pg_tokenizer_18`	`0.1.1`	el9.x86_64	pigsty	11.0 MiB	pg_tokenizer_18-0.1.1-1PIGSTY.el9.x86_64.rpm
`pg_tokenizer_18`	`0.1.1`	el9.aarch64	pigsty	10.9 MiB	pg_tokenizer_18-0.1.1-1PIGSTY.el9.aarch64.rpm
`pg_tokenizer_18`	`0.1.1`	el10.x86_64	pigsty	10.9 MiB	pg_tokenizer_18-0.1.1-1PIGSTY.el10.x86_64.rpm
`pg_tokenizer_18`	`0.1.1`	el10.aarch64	pigsty	11.0 MiB	pg_tokenizer_18-0.1.1-1PIGSTY.el10.aarch64.rpm
`postgresql-18-pg-tokenizer`	`0.1.1`	d12.x86_64	pigsty	9.9 MiB	postgresql-18-pg-tokenizer_0.1.1-1PIGSTY~bookworm_amd64.deb
`postgresql-18-pg-tokenizer`	`0.1.1`	d12.aarch64	pigsty	9.7 MiB	postgresql-18-pg-tokenizer_0.1.1-1PIGSTY~bookworm_arm64.deb
`postgresql-18-pg-tokenizer`	`0.1.1`	d13.x86_64	pigsty	9.9 MiB	postgresql-18-pg-tokenizer_0.1.1-1PIGSTY~trixie_amd64.deb
`postgresql-18-pg-tokenizer`	`0.1.1`	d13.aarch64	pigsty	9.7 MiB	postgresql-18-pg-tokenizer_0.1.1-1PIGSTY~trixie_arm64.deb
`postgresql-18-pg-tokenizer`	`0.1.1`	u22.x86_64	pigsty	10.9 MiB	postgresql-18-pg-tokenizer_0.1.1-1PIGSTY~jammy_amd64.deb
`postgresql-18-pg-tokenizer`	`0.1.1`	u22.aarch64	pigsty	10.7 MiB	postgresql-18-pg-tokenizer_0.1.1-1PIGSTY~jammy_arm64.deb
`postgresql-18-pg-tokenizer`	`0.1.1`	u24.x86_64	pigsty	10.8 MiB	postgresql-18-pg-tokenizer_0.1.1-1PIGSTY~noble_amd64.deb
`postgresql-18-pg-tokenizer`	`0.1.1`	u24.aarch64	pigsty	10.6 MiB	postgresql-18-pg-tokenizer_0.1.1-1PIGSTY~noble_arm64.deb

Package	Version	OS	ORG	SIZE	File URL
`pg_tokenizer_17`	`0.1.1`	el8.x86_64	pigsty	11.7 MiB	pg_tokenizer_17-0.1.1-1PIGSTY.el8.x86_64.rpm
`pg_tokenizer_17`	`0.1.1`	el8.aarch64	pigsty	11.5 MiB	pg_tokenizer_17-0.1.1-1PIGSTY.el8.aarch64.rpm
`pg_tokenizer_17`	`0.1.1`	el9.x86_64	pigsty	11.0 MiB	pg_tokenizer_17-0.1.1-1PIGSTY.el9.x86_64.rpm
`pg_tokenizer_17`	`0.1.1`	el9.aarch64	pigsty	10.9 MiB	pg_tokenizer_17-0.1.1-1PIGSTY.el9.aarch64.rpm
`pg_tokenizer_17`	`0.1.1`	el10.x86_64	pigsty	10.9 MiB	pg_tokenizer_17-0.1.1-1PIGSTY.el10.x86_64.rpm
`pg_tokenizer_17`	`0.1.1`	el10.aarch64	pigsty	11.0 MiB	pg_tokenizer_17-0.1.1-1PIGSTY.el10.aarch64.rpm
`postgresql-17-pg-tokenizer`	`0.1.1`	d12.x86_64	pigsty	9.9 MiB	postgresql-17-pg-tokenizer_0.1.1-1PIGSTY~bookworm_amd64.deb
`postgresql-17-pg-tokenizer`	`0.1.1`	d12.aarch64	pigsty	9.7 MiB	postgresql-17-pg-tokenizer_0.1.1-1PIGSTY~bookworm_arm64.deb
`postgresql-17-pg-tokenizer`	`0.1.1`	d13.x86_64	pigsty	9.9 MiB	postgresql-17-pg-tokenizer_0.1.1-1PIGSTY~trixie_amd64.deb
`postgresql-17-pg-tokenizer`	`0.1.1`	d13.aarch64	pigsty	9.7 MiB	postgresql-17-pg-tokenizer_0.1.1-1PIGSTY~trixie_arm64.deb
`postgresql-17-pg-tokenizer`	`0.1.1`	u22.x86_64	pigsty	10.9 MiB	postgresql-17-pg-tokenizer_0.1.1-1PIGSTY~jammy_amd64.deb
`postgresql-17-pg-tokenizer`	`0.1.1`	u22.aarch64	pigsty	10.7 MiB	postgresql-17-pg-tokenizer_0.1.1-1PIGSTY~jammy_arm64.deb
`postgresql-17-pg-tokenizer`	`0.1.1`	u24.x86_64	pigsty	10.8 MiB	postgresql-17-pg-tokenizer_0.1.1-1PIGSTY~noble_amd64.deb
`postgresql-17-pg-tokenizer`	`0.1.1`	u24.aarch64	pigsty	10.7 MiB	postgresql-17-pg-tokenizer_0.1.1-1PIGSTY~noble_arm64.deb

Package	Version	OS	ORG	SIZE	File URL
`pg_tokenizer_16`	`0.1.1`	el8.x86_64	pigsty	11.7 MiB	pg_tokenizer_16-0.1.1-1PIGSTY.el8.x86_64.rpm
`pg_tokenizer_16`	`0.1.1`	el8.aarch64	pigsty	11.5 MiB	pg_tokenizer_16-0.1.1-1PIGSTY.el8.aarch64.rpm
`pg_tokenizer_16`	`0.1.1`	el9.x86_64	pigsty	11.0 MiB	pg_tokenizer_16-0.1.1-1PIGSTY.el9.x86_64.rpm
`pg_tokenizer_16`	`0.1.1`	el9.aarch64	pigsty	10.9 MiB	pg_tokenizer_16-0.1.1-1PIGSTY.el9.aarch64.rpm
`pg_tokenizer_16`	`0.1.1`	el10.x86_64	pigsty	10.9 MiB	pg_tokenizer_16-0.1.1-1PIGSTY.el10.x86_64.rpm
`pg_tokenizer_16`	`0.1.1`	el10.aarch64	pigsty	11.0 MiB	pg_tokenizer_16-0.1.1-1PIGSTY.el10.aarch64.rpm
`postgresql-16-pg-tokenizer`	`0.1.1`	d12.x86_64	pigsty	9.9 MiB	postgresql-16-pg-tokenizer_0.1.1-1PIGSTY~bookworm_amd64.deb
`postgresql-16-pg-tokenizer`	`0.1.1`	d12.aarch64	pigsty	9.7 MiB	postgresql-16-pg-tokenizer_0.1.1-1PIGSTY~bookworm_arm64.deb
`postgresql-16-pg-tokenizer`	`0.1.1`	d13.x86_64	pigsty	9.9 MiB	postgresql-16-pg-tokenizer_0.1.1-1PIGSTY~trixie_amd64.deb
`postgresql-16-pg-tokenizer`	`0.1.1`	d13.aarch64	pigsty	9.7 MiB	postgresql-16-pg-tokenizer_0.1.1-1PIGSTY~trixie_arm64.deb
`postgresql-16-pg-tokenizer`	`0.1.1`	u22.x86_64	pigsty	10.9 MiB	postgresql-16-pg-tokenizer_0.1.1-1PIGSTY~jammy_amd64.deb
`postgresql-16-pg-tokenizer`	`0.1.1`	u22.aarch64	pigsty	10.7 MiB	postgresql-16-pg-tokenizer_0.1.1-1PIGSTY~jammy_arm64.deb
`postgresql-16-pg-tokenizer`	`0.1.1`	u24.x86_64	pigsty	10.8 MiB	postgresql-16-pg-tokenizer_0.1.1-1PIGSTY~noble_amd64.deb
`postgresql-16-pg-tokenizer`	`0.1.1`	u24.aarch64	pigsty	10.7 MiB	postgresql-16-pg-tokenizer_0.1.1-1PIGSTY~noble_arm64.deb

Package	Version	OS	ORG	SIZE	File URL
`pg_tokenizer_15`	`0.1.1`	el8.x86_64	pigsty	11.7 MiB	pg_tokenizer_15-0.1.1-1PIGSTY.el8.x86_64.rpm
`pg_tokenizer_15`	`0.1.1`	el8.aarch64	pigsty	11.5 MiB	pg_tokenizer_15-0.1.1-1PIGSTY.el8.aarch64.rpm
`pg_tokenizer_15`	`0.1.1`	el9.x86_64	pigsty	11.0 MiB	pg_tokenizer_15-0.1.1-1PIGSTY.el9.x86_64.rpm
`pg_tokenizer_15`	`0.1.1`	el9.aarch64	pigsty	10.9 MiB	pg_tokenizer_15-0.1.1-1PIGSTY.el9.aarch64.rpm
`pg_tokenizer_15`	`0.1.1`	el10.x86_64	pigsty	10.9 MiB	pg_tokenizer_15-0.1.1-1PIGSTY.el10.x86_64.rpm
`pg_tokenizer_15`	`0.1.1`	el10.aarch64	pigsty	11.0 MiB	pg_tokenizer_15-0.1.1-1PIGSTY.el10.aarch64.rpm
`postgresql-15-pg-tokenizer`	`0.1.1`	d12.x86_64	pigsty	9.9 MiB	postgresql-15-pg-tokenizer_0.1.1-1PIGSTY~bookworm_amd64.deb
`postgresql-15-pg-tokenizer`	`0.1.1`	d12.aarch64	pigsty	9.7 MiB	postgresql-15-pg-tokenizer_0.1.1-1PIGSTY~bookworm_arm64.deb
`postgresql-15-pg-tokenizer`	`0.1.1`	d13.x86_64	pigsty	9.9 MiB	postgresql-15-pg-tokenizer_0.1.1-1PIGSTY~trixie_amd64.deb
`postgresql-15-pg-tokenizer`	`0.1.1`	d13.aarch64	pigsty	9.7 MiB	postgresql-15-pg-tokenizer_0.1.1-1PIGSTY~trixie_arm64.deb
`postgresql-15-pg-tokenizer`	`0.1.1`	u22.x86_64	pigsty	10.9 MiB	postgresql-15-pg-tokenizer_0.1.1-1PIGSTY~jammy_amd64.deb
`postgresql-15-pg-tokenizer`	`0.1.1`	u22.aarch64	pigsty	10.7 MiB	postgresql-15-pg-tokenizer_0.1.1-1PIGSTY~jammy_arm64.deb
`postgresql-15-pg-tokenizer`	`0.1.1`	u24.x86_64	pigsty	10.8 MiB	postgresql-15-pg-tokenizer_0.1.1-1PIGSTY~noble_amd64.deb
`postgresql-15-pg-tokenizer`	`0.1.1`	u24.aarch64	pigsty	10.7 MiB	postgresql-15-pg-tokenizer_0.1.1-1PIGSTY~noble_arm64.deb

Package	Version	OS	ORG	SIZE	File URL
`pg_tokenizer_14`	`0.1.1`	el8.x86_64	pigsty	11.7 MiB	pg_tokenizer_14-0.1.1-1PIGSTY.el8.x86_64.rpm
`pg_tokenizer_14`	`0.1.1`	el8.aarch64	pigsty	11.5 MiB	pg_tokenizer_14-0.1.1-1PIGSTY.el8.aarch64.rpm
`pg_tokenizer_14`	`0.1.1`	el9.x86_64	pigsty	11.0 MiB	pg_tokenizer_14-0.1.1-1PIGSTY.el9.x86_64.rpm
`pg_tokenizer_14`	`0.1.1`	el9.aarch64	pigsty	10.9 MiB	pg_tokenizer_14-0.1.1-1PIGSTY.el9.aarch64.rpm
`pg_tokenizer_14`	`0.1.1`	el10.x86_64	pigsty	10.9 MiB	pg_tokenizer_14-0.1.1-1PIGSTY.el10.x86_64.rpm
`pg_tokenizer_14`	`0.1.1`	el10.aarch64	pigsty	11.0 MiB	pg_tokenizer_14-0.1.1-1PIGSTY.el10.aarch64.rpm
`postgresql-14-pg-tokenizer`	`0.1.1`	d12.x86_64	pigsty	9.9 MiB	postgresql-14-pg-tokenizer_0.1.1-1PIGSTY~bookworm_amd64.deb
`postgresql-14-pg-tokenizer`	`0.1.1`	d12.aarch64	pigsty	9.7 MiB	postgresql-14-pg-tokenizer_0.1.1-1PIGSTY~bookworm_arm64.deb
`postgresql-14-pg-tokenizer`	`0.1.1`	d13.x86_64	pigsty	9.9 MiB	postgresql-14-pg-tokenizer_0.1.1-1PIGSTY~trixie_amd64.deb
`postgresql-14-pg-tokenizer`	`0.1.1`	d13.aarch64	pigsty	9.7 MiB	postgresql-14-pg-tokenizer_0.1.1-1PIGSTY~trixie_arm64.deb
`postgresql-14-pg-tokenizer`	`0.1.1`	u22.x86_64	pigsty	10.9 MiB	postgresql-14-pg-tokenizer_0.1.1-1PIGSTY~jammy_amd64.deb
`postgresql-14-pg-tokenizer`	`0.1.1`	u22.aarch64	pigsty	10.7 MiB	postgresql-14-pg-tokenizer_0.1.1-1PIGSTY~jammy_arm64.deb
`postgresql-14-pg-tokenizer`	`0.1.1`	u24.x86_64	pigsty	10.8 MiB	postgresql-14-pg-tokenizer_0.1.1-1PIGSTY~noble_amd64.deb
`postgresql-14-pg-tokenizer`	`0.1.1`	u24.aarch64	pigsty	10.7 MiB	postgresql-14-pg-tokenizer_0.1.1-1PIGSTY~noble_arm64.deb

Source

Repository

github.com/tensorchord/pg_tokenizer.rs

Source Tarball

pg_tokenizer.rs-0.1.1.tar.gz

pig build pkg pg_tokenizer;		# build rpm/deb

Install

Make sure PGDG and PIGSTY repo available:

pig repo add pgsql -u   # add both repo and update cache

Install this extension with pig:

pig install pg_tokenizer;		# install via package name, for the active PG version

pig install pg_tokenizer -v 18;   # install for PG 18
pig install pg_tokenizer -v 17;   # install for PG 17
pig install pg_tokenizer -v 16;   # install for PG 16
pig install pg_tokenizer -v 15;   # install for PG 15
pig install pg_tokenizer -v 14;   # install for PG 14

Config this extension to shared_preload_libraries:

shared_preload_libraries = 'pg_tokenizer';

Create this extension with:

CREATE EXTENSION pg_tokenizer;

Usage

GitHub: tensorchord/pg_tokenizer.rs

pg_tokenizer is a PostgreSQL extension that provides tokenizers for full-text search. It is designed to work with VectorChord-bm25 for native BM25 ranking index support.

Quick Start

CREATE EXTENSION pg_tokenizer;

-- Create a tokenizer using the LLMLingua2 model
SELECT create_tokenizer('tokenizer1', $$
model = "llmlingua2"
$$);

-- Tokenize text
SELECT tokenize('PostgreSQL is a powerful, open-source object-relational database system. It has over 15 years of active development.', 'tokenizer1');

Tokenizer Models

pg_tokenizer supports multiple tokenizer models for different languages and use cases:

Model	Language	Description
`llmlingua2`	English	BERT-based tokenizer from LLMLingua2
`jieba`	Chinese	Jieba Chinese text segmentation
`lindera/ipadic`	Japanese	Lindera tokenizer with IPADIC dictionary
Custom models	Any	User-trained models for domain-specific text

Creating Tokenizers

-- English tokenizer
SELECT create_tokenizer('en_tokenizer', $$
model = "llmlingua2"
$$);

-- Chinese tokenizer
SELECT create_tokenizer('zh_tokenizer', $$
model = "jieba"
$$);

-- Japanese tokenizer
SELECT create_tokenizer('ja_tokenizer', $$
model = "lindera/ipadic"
$$);

Tokenizing Text

-- Tokenize English text
SELECT tokenize('full text search in PostgreSQL', 'en_tokenizer');

-- Tokenize Chinese text
SELECT tokenize('PostgreSQL是一个强大的数据库系统', 'zh_tokenizer');

Text Analyzer

pg_tokenizer also provides text analyzer functionality that combines tokenization with additional text processing steps. For detailed text analyzer usage, refer to the Text Analyzer documentation.

Integration with VectorChord-BM25

pg_tokenizer is typically used together with VectorChord-BM25 for full BM25 ranking support:

CREATE EXTENSION IF NOT EXISTS pg_tokenizer CASCADE;
CREATE EXTENSION IF NOT EXISTS vchord_bm25 CASCADE;

-- Create a tokenizer
SELECT create_tokenizer('my_tokenizer', $$
model = "llmlingua2"
$$);

-- Tokenize text into bm25vectors for indexing and search
SELECT tokenize('your search query', 'my_tokenizer');

Documentation

For more details, see the full documentation:

Last updated on 2026-03-24

vchord_bm25 biscuit