pg_tiktoken
pg_tiktoken
pg_tiktoken : tiktoken tokenizer for use with OpenAI models in postgres
Overview
| ID | Extension | Package | Version | Category | License | Language |
|---|---|---|---|---|---|---|
| 1870 | pg_tiktoken
|
pg_tiktoken
|
0.0.1 |
RAG
|
Apache-2.0
|
Rust
|
| Attribute | Has Binary | Has Library | Need Load | Has DDL | Relocatable | Trusted |
|---|---|---|---|---|---|---|
--s-d--
|
No
|
Yes
|
No
|
Yes
|
no
|
no
|
| Relationships | |
|---|---|
| See Also | vectorize
pg_summarize
pg4ml
pgml
vector
vchord
vectorscale
pg_graphql
|
Packages
| Type | Repo | Version | PG Major Compatibility | Package Pattern | Dependencies |
|---|---|---|---|---|---|
| EXT | PIGSTY
|
0.0.1 |
18
17
16
15
14
|
pg_tiktoken |
- |
| RPM | PIGSTY
|
0.0.1 |
18
17
16
15
14
|
pg_tiktoken_$v |
- |
| DEB | PIGSTY
|
0.0.1 |
18
17
16
15
14
|
postgresql-$v-pg-tiktoken |
- |
| Linux / PG | PG18 | PG17 | PG16 | PG15 | PG14 |
|---|---|---|---|---|---|
el8.x86_64
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
el8.aarch64
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
el9.x86_64
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
el9.aarch64
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
el10.x86_64
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
el10.aarch64
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
d12.x86_64
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
d12.aarch64
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
d13.x86_64
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
d13.aarch64
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
u22.x86_64
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
u22.aarch64
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
u24.x86_64
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
u24.aarch64
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
PIGSTY 0.0.1
|
Source
pig build pkg pg_tiktoken; # build rpm/debInstall
Make sure PGDG and PIGSTY repo available:
pig repo add pgsql -u # add both repo and update cacheInstall this extension with pig:
pig install pg_tiktoken; # install via package name, for the active PG version
pig install pg_tiktoken -v 18; # install for PG 18
pig install pg_tiktoken -v 17; # install for PG 17
pig install pg_tiktoken -v 16; # install for PG 16
pig install pg_tiktoken -v 15; # install for PG 15
pig install pg_tiktoken -v 14; # install for PG 14Create this extension with:
CREATE EXTENSION pg_tiktoken;Usage
pg_tiktoken: tiktoken tokenizer for use with OpenAI models in PostgreSQL. Source: README.md
pg_tiktoken is a PostgreSQL extension that provides input tokenization using OpenAI’s tiktoken library. It allows you to count and encode tokens directly in SQL, which is useful for managing input length limits when working with OpenAI models.
Functions
tiktoken_count
Count the number of tokens for a given encoding or model:
SELECT tiktoken_count('p50k_edit', 'A long time ago in a galaxy far, far away');
tiktoken_count
----------------
11
(1 row)tiktoken_encode
Get the token IDs for a given encoding or model:
SELECT tiktoken_encode('cl100k_base', 'A long time ago in a galaxy far, far away');
tiktoken_encode
----------------------------------------------------
{32,1317,892,4227,304,264,34261,3117,11,3117,3201}
(1 row)Both tiktoken_count and tiktoken_encode accept either an encoding name or an OpenAI model name as the first argument.
Supported Models
| Encoding name | OpenAI models |
|---|---|
cl100k_base |
ChatGPT models, text-embedding-ada-002 |
p50k_base |
Code models, text-davinci-002, text-davinci-003 |
p50k_edit |
Edit models like text-davinci-edit-001, code-davinci-edit-001 |
r50k_base (or gpt2) |
GPT-3 models like davinci |
Last updated on