zhparser

zhparser

zhparser : a parser for full-text search of Chinese

Overview

ID Extension Package Version Category License Language
2130
zhparser
zhparser
2.3
FTS
PostgreSQL
C
Attribute Has Binary Has Library Need Load Has DDL Relocatable Trusted
--s-d-r
No
Yes
No
Yes
yes
no
Relationships
See Also
pg_trgm
rum
pg_search
pgroonga
pgroonga_database
pg_bigm
pg_tokenizer
vchord_bm25

Packages

Type Repo Version PG Major Compatibility Package Pattern Dependencies
EXT
PIGSTY
2.3
18
17
16
15
14
zhparser -
RPM
PIGSTY
2.3
18
17
16
15
14
zhparser_$v -
DEB
PIGSTY
2.3
18
17
16
15
14
postgresql-$v-zhparser -
Linux / PG PG18 PG17 PG16 PG15 PG14
el8.x86_64
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
el8.aarch64
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
el9.x86_64
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
el9.aarch64
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
el10.x86_64
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
el10.aarch64
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
d12.x86_64
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
d12.aarch64
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
d13.x86_64
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
d13.aarch64
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
u22.x86_64
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
u22.aarch64
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
u24.x86_64
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
u24.aarch64
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
PIGSTY 2.3
Package Version OS ORG SIZE File URL
zhparser_18 2.3 el8.x86_64 pigsty 4.7 MiB zhparser_18-2.3-1PIGSTY.el8.x86_64.rpm
zhparser_18 2.3 el8.aarch64 pigsty 4.7 MiB zhparser_18-2.3-1PIGSTY.el8.aarch64.rpm
zhparser_18 2.3 el9.x86_64 pigsty 4.3 MiB zhparser_18-2.3-1PIGSTY.el9.x86_64.rpm
zhparser_18 2.3 el9.aarch64 pigsty 4.3 MiB zhparser_18-2.3-1PIGSTY.el9.aarch64.rpm
zhparser_18 2.3 el10.x86_64 pigsty 4.3 MiB zhparser_18-2.3-1PIGSTY.el10.x86_64.rpm
zhparser_18 2.3 el10.aarch64 pigsty 4.3 MiB zhparser_18-2.3-1PIGSTY.el10.aarch64.rpm
postgresql-18-zhparser 2.3 d12.x86_64 pigsty 4.0 MiB postgresql-18-zhparser_2.3-1PIGSTY~bookworm_amd64.deb
postgresql-18-zhparser 2.3 d12.aarch64 pigsty 4.0 MiB postgresql-18-zhparser_2.3-1PIGSTY~bookworm_arm64.deb
postgresql-18-zhparser 2.3 d13.x86_64 pigsty 4.0 MiB postgresql-18-zhparser_2.3-1PIGSTY~trixie_amd64.deb
postgresql-18-zhparser 2.3 d13.aarch64 pigsty 4.0 MiB postgresql-18-zhparser_2.3-1PIGSTY~trixie_arm64.deb
postgresql-18-zhparser 2.3 u22.x86_64 pigsty 4.3 MiB postgresql-18-zhparser_2.3-1PIGSTY~jammy_amd64.deb
postgresql-18-zhparser 2.3 u22.aarch64 pigsty 4.3 MiB postgresql-18-zhparser_2.3-1PIGSTY~jammy_arm64.deb
postgresql-18-zhparser 2.3 u24.x86_64 pigsty 4.3 MiB postgresql-18-zhparser_2.3-1PIGSTY~noble_amd64.deb
postgresql-18-zhparser 2.3 u24.aarch64 pigsty 4.3 MiB postgresql-18-zhparser_2.3-1PIGSTY~noble_arm64.deb
Package Version OS ORG SIZE File URL
zhparser_17 2.3 el8.x86_64 pigsty 4.7 MiB zhparser_17-2.3-1PIGSTY.el8.x86_64.rpm
zhparser_17 2.3 el8.aarch64 pigsty 4.7 MiB zhparser_17-2.3-1PIGSTY.el8.aarch64.rpm
zhparser_17 2.3 el9.x86_64 pigsty 4.3 MiB zhparser_17-2.3-1PIGSTY.el9.x86_64.rpm
zhparser_17 2.3 el9.aarch64 pigsty 4.3 MiB zhparser_17-2.3-1PIGSTY.el9.aarch64.rpm
zhparser_17 2.3 el10.x86_64 pigsty 4.3 MiB zhparser_17-2.3-1PIGSTY.el10.x86_64.rpm
zhparser_17 2.3 el10.aarch64 pigsty 4.3 MiB zhparser_17-2.3-1PIGSTY.el10.aarch64.rpm
postgresql-17-zhparser 2.3 d12.x86_64 pigsty 4.0 MiB postgresql-17-zhparser_2.3-1PIGSTY~bookworm_amd64.deb
postgresql-17-zhparser 2.3 d12.aarch64 pigsty 4.0 MiB postgresql-17-zhparser_2.3-1PIGSTY~bookworm_arm64.deb
postgresql-17-zhparser 2.3 d13.x86_64 pigsty 4.0 MiB postgresql-17-zhparser_2.3-1PIGSTY~trixie_amd64.deb
postgresql-17-zhparser 2.3 d13.aarch64 pigsty 4.0 MiB postgresql-17-zhparser_2.3-1PIGSTY~trixie_arm64.deb
postgresql-17-zhparser 2.3 u22.x86_64 pigsty 4.3 MiB postgresql-17-zhparser_2.3-1PIGSTY~jammy_amd64.deb
postgresql-17-zhparser 2.3 u22.aarch64 pigsty 4.3 MiB postgresql-17-zhparser_2.3-1PIGSTY~jammy_arm64.deb
postgresql-17-zhparser 2.3 u24.x86_64 pigsty 4.3 MiB postgresql-17-zhparser_2.3-1PIGSTY~noble_amd64.deb
postgresql-17-zhparser 2.3 u24.aarch64 pigsty 4.3 MiB postgresql-17-zhparser_2.3-1PIGSTY~noble_arm64.deb
Package Version OS ORG SIZE File URL
zhparser_16 2.3 el8.x86_64 pigsty 4.7 MiB zhparser_16-2.3-1PIGSTY.el8.x86_64.rpm
zhparser_16 2.3 el8.aarch64 pigsty 4.7 MiB zhparser_16-2.3-1PIGSTY.el8.aarch64.rpm
zhparser_16 2.3 el9.x86_64 pigsty 4.3 MiB zhparser_16-2.3-1PIGSTY.el9.x86_64.rpm
zhparser_16 2.3 el9.aarch64 pigsty 4.3 MiB zhparser_16-2.3-1PIGSTY.el9.aarch64.rpm
zhparser_16 2.3 el10.x86_64 pigsty 4.3 MiB zhparser_16-2.3-1PIGSTY.el10.x86_64.rpm
zhparser_16 2.3 el10.aarch64 pigsty 4.3 MiB zhparser_16-2.3-1PIGSTY.el10.aarch64.rpm
postgresql-16-zhparser 2.3 d12.x86_64 pigsty 4.0 MiB postgresql-16-zhparser_2.3-1PIGSTY~bookworm_amd64.deb
postgresql-16-zhparser 2.3 d12.aarch64 pigsty 4.0 MiB postgresql-16-zhparser_2.3-1PIGSTY~bookworm_arm64.deb
postgresql-16-zhparser 2.3 d13.x86_64 pigsty 4.0 MiB postgresql-16-zhparser_2.3-1PIGSTY~trixie_amd64.deb
postgresql-16-zhparser 2.3 d13.aarch64 pigsty 4.0 MiB postgresql-16-zhparser_2.3-1PIGSTY~trixie_arm64.deb
postgresql-16-zhparser 2.3 u22.x86_64 pigsty 4.3 MiB postgresql-16-zhparser_2.3-1PIGSTY~jammy_amd64.deb
postgresql-16-zhparser 2.3 u22.aarch64 pigsty 4.3 MiB postgresql-16-zhparser_2.3-1PIGSTY~jammy_arm64.deb
postgresql-16-zhparser 2.3 u24.x86_64 pigsty 4.3 MiB postgresql-16-zhparser_2.3-1PIGSTY~noble_amd64.deb
postgresql-16-zhparser 2.3 u24.aarch64 pigsty 4.3 MiB postgresql-16-zhparser_2.3-1PIGSTY~noble_arm64.deb
Package Version OS ORG SIZE File URL
zhparser_15 2.3 el8.x86_64 pigsty 4.7 MiB zhparser_15-2.3-1PIGSTY.el8.x86_64.rpm
zhparser_15 2.3 el8.aarch64 pigsty 4.7 MiB zhparser_15-2.3-1PIGSTY.el8.aarch64.rpm
zhparser_15 2.3 el9.x86_64 pigsty 4.3 MiB zhparser_15-2.3-1PIGSTY.el9.x86_64.rpm
zhparser_15 2.3 el9.aarch64 pigsty 4.3 MiB zhparser_15-2.3-1PIGSTY.el9.aarch64.rpm
zhparser_15 2.3 el10.x86_64 pigsty 4.3 MiB zhparser_15-2.3-1PIGSTY.el10.x86_64.rpm
zhparser_15 2.3 el10.aarch64 pigsty 4.3 MiB zhparser_15-2.3-1PIGSTY.el10.aarch64.rpm
postgresql-15-zhparser 2.3 d12.x86_64 pigsty 4.0 MiB postgresql-15-zhparser_2.3-1PIGSTY~bookworm_amd64.deb
postgresql-15-zhparser 2.3 d12.aarch64 pigsty 4.0 MiB postgresql-15-zhparser_2.3-1PIGSTY~bookworm_arm64.deb
postgresql-15-zhparser 2.3 d13.x86_64 pigsty 4.0 MiB postgresql-15-zhparser_2.3-1PIGSTY~trixie_amd64.deb
postgresql-15-zhparser 2.3 d13.aarch64 pigsty 4.0 MiB postgresql-15-zhparser_2.3-1PIGSTY~trixie_arm64.deb
postgresql-15-zhparser 2.3 u22.x86_64 pigsty 4.3 MiB postgresql-15-zhparser_2.3-1PIGSTY~jammy_amd64.deb
postgresql-15-zhparser 2.3 u22.aarch64 pigsty 4.3 MiB postgresql-15-zhparser_2.3-1PIGSTY~jammy_arm64.deb
postgresql-15-zhparser 2.3 u24.x86_64 pigsty 4.3 MiB postgresql-15-zhparser_2.3-1PIGSTY~noble_amd64.deb
postgresql-15-zhparser 2.3 u24.aarch64 pigsty 4.3 MiB postgresql-15-zhparser_2.3-1PIGSTY~noble_arm64.deb
Package Version OS ORG SIZE File URL
zhparser_14 2.3 el8.x86_64 pigsty 4.7 MiB zhparser_14-2.3-1PIGSTY.el8.x86_64.rpm
zhparser_14 2.3 el8.aarch64 pigsty 4.7 MiB zhparser_14-2.3-1PIGSTY.el8.aarch64.rpm
zhparser_14 2.3 el9.x86_64 pigsty 4.3 MiB zhparser_14-2.3-1PIGSTY.el9.x86_64.rpm
zhparser_14 2.3 el9.aarch64 pigsty 4.3 MiB zhparser_14-2.3-1PIGSTY.el9.aarch64.rpm
zhparser_14 2.3 el10.x86_64 pigsty 4.3 MiB zhparser_14-2.3-1PIGSTY.el10.x86_64.rpm
zhparser_14 2.3 el10.aarch64 pigsty 4.3 MiB zhparser_14-2.3-1PIGSTY.el10.aarch64.rpm
postgresql-14-zhparser 2.3 d12.x86_64 pigsty 4.0 MiB postgresql-14-zhparser_2.3-1PIGSTY~bookworm_amd64.deb
postgresql-14-zhparser 2.3 d12.aarch64 pigsty 4.0 MiB postgresql-14-zhparser_2.3-1PIGSTY~bookworm_arm64.deb
postgresql-14-zhparser 2.3 d13.x86_64 pigsty 4.0 MiB postgresql-14-zhparser_2.3-1PIGSTY~trixie_amd64.deb
postgresql-14-zhparser 2.3 d13.aarch64 pigsty 4.0 MiB postgresql-14-zhparser_2.3-1PIGSTY~trixie_arm64.deb
postgresql-14-zhparser 2.3 u22.x86_64 pigsty 4.3 MiB postgresql-14-zhparser_2.3-1PIGSTY~jammy_amd64.deb
postgresql-14-zhparser 2.3 u22.aarch64 pigsty 4.3 MiB postgresql-14-zhparser_2.3-1PIGSTY~jammy_arm64.deb
postgresql-14-zhparser 2.3 u24.x86_64 pigsty 4.3 MiB postgresql-14-zhparser_2.3-1PIGSTY~noble_amd64.deb
postgresql-14-zhparser 2.3 u24.aarch64 pigsty 4.3 MiB postgresql-14-zhparser_2.3-1PIGSTY~noble_arm64.deb

Source

pig build pkg zhparser;		# build rpm/deb

Install

Make sure PGDG and PIGSTY repo available:

pig repo add pgsql -u   # add both repo and update cache

Install this extension with pig:

pig install zhparser;		# install via package name, for the active PG version

pig install zhparser -v 18;   # install for PG 18
pig install zhparser -v 17;   # install for PG 17
pig install zhparser -v 16;   # install for PG 16
pig install zhparser -v 15;   # install for PG 15
pig install zhparser -v 14;   # install for PG 14

Create this extension with:

CREATE EXTENSION zhparser;

Usage

GitHub: amutu/zhparser

zhparser is a PostgreSQL extension for full-text search of Chinese, based on the Simple Chinese Word Segmentation (SCWS) library.

Features

  • Chinese text segmentation for PostgreSQL full-text search
  • Built on the SCWS (Simple Chinese Word Segmentation) library
  • Supports custom dictionaries (TXT and XDB formats)
  • Database-level custom word tables (since v2.1)
  • Multiple tunable parameters for segmentation behavior

Quick Start

-- Create the extension
CREATE EXTENSION zhparser;

-- Create a text search configuration using zhparser
CREATE TEXT SEARCH CONFIGURATION chinese (PARSER = zhparser);

-- Add token type mappings
ALTER TEXT SEARCH CONFIGURATION chinese ADD MAPPING FOR n,v,a,i,e,l WITH simple;

-- Test Chinese text segmentation
SELECT to_tsvector('chinese', '小明硕士毕业于中国科学院计算所,后在日本京都大学深造');

-- Create a table and index for Chinese full text search
CREATE TABLE articles (id serial PRIMARY KEY, title text, body text);

CREATE INDEX articles_body_idx ON articles
  USING gin (to_tsvector('chinese', body));

-- Query with Chinese full text search
SELECT * FROM articles
  WHERE to_tsvector('chinese', body) @@ to_tsquery('chinese', '中国');

Configuration Parameters

zhparser provides several GUC parameters to control segmentation behavior:

Parameter Default Description
zhparser.punctuation_ignore off Ignore all punctuation
zhparser.seg_with_duality off Perform duality segmentation on long words
zhparser.dict_in_memory off Load the whole dictionary into memory
zhparser.multi_short off Short word compound segmentation
zhparser.multi_duality off Duality compound segmentation
zhparser.multi_zmain off Key word in first compound segmentation
zhparser.multi_zall off Use all compound segmentation

Token Types

zhparser supports the following token types from SCWS:

Code Description
a Adjective
b Differentiation (区别词)
c Conjunction
d Adverb
e Exclamation
f Position word (方位词)
g Root word (词根)
h Prefix
i Idiom
j Abbreviation
k Suffix
l Temporary idiom
m Numeral
n Noun
o Onomatopoeia
p Preposition
q Classifier
r Pronoun
s Space word (处所词)
t Time word
u Auxiliary
v Verb
w Punctuation
x Unknown
y Modal particle
z Status word (状态词)

Custom Dictionaries

File-based Dictionaries

Place custom dictionary files in the share directory (typically $SHAREDIR/tsearch_data/):

  • TXT format: one word per line
  • XDB format: compiled SCWS dictionary format

Custom dictionaries take precedence over built-in dictionaries.

Database-level Custom Words (v2.1+)

-- Add custom words via zhparser's built-in table
INSERT INTO zhparser.zhprs_custom_word VALUES ('中国科学院计算所');

-- Reload custom dictionary (reconnect after sync to take effect)
SELECT sync_zhprs_custom_word();

-- Verify segmentation with custom word
SELECT to_tsvector('chinese', '小明硕士毕业于中国科学院计算所');

Docker Quick Start

docker run --name pgzhparser -d \
  -e POSTGRES_PASSWORD=somepassword \
  zhparser/zhparser:bookworm-16
Last updated on