pg4ml
pg4ml : Machine learning framework for PostgreSQL
Overview
| ID | Extension | Package | Version | Category | License | Language |
|---|---|---|---|---|---|---|
| 1880 | pg4ml
|
pg4ml
|
2.0 |
RAG
|
AGPL-3.0
|
C
|
| Attribute | Has Binary | Has Library | Need Load | Has DDL | Relocatable | Trusted |
|---|---|---|---|---|---|---|
----dtr
|
No
|
No
|
No
|
Yes
|
yes
|
yes
|
| Relationships | |
|---|---|
| Requires | plpgsql
tablefunc
cube
plpython3u
|
| See Also | pgml
vectorize
pg_summarize
pg_tiktoken
vector
vchord
vectorscale
pg_strom
|
require python3
Packages
| Type | Repo | Version | PG Major Compatibility | Package Pattern | Dependencies |
|---|---|---|---|---|---|
| EXT | PIGSTY
|
2.0 |
18
17
16
15
14
|
pg4ml |
plpgsql, tablefunc, cube, plpython3u |
| RPM | PIGSTY
|
2.0 |
18
17
16
15
14
|
pg4ml_$v |
- |
| DEB | PIGSTY
|
2.0 |
18
17
16
15
14
|
postgresql-$v-pg4ml |
- |
| Linux / PG | PG18 | PG17 | PG16 | PG15 | PG14 |
|---|---|---|---|---|---|
el8.x86_64
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
el8.aarch64
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
el9.x86_64
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
el9.aarch64
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
el10.x86_64
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
el10.aarch64
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
d12.x86_64
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
d12.aarch64
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
d13.x86_64
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
d13.aarch64
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
u22.x86_64
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
u22.aarch64
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
u24.x86_64
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
u24.aarch64
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
PIGSTY 2.0
|
Source
pig build pkg pg4ml; # build rpm/debInstall
Make sure PGDG and PIGSTY repo available:
pig repo add pgsql -u # add both repo and update cacheInstall this extension with pig:
pig install pg4ml; # install via package name, for the active PG version
pig install pg4ml -v 18; # install for PG 18
pig install pg4ml -v 17; # install for PG 17
pig install pg4ml -v 16; # install for PG 16
pig install pg4ml -v 15; # install for PG 15
pig install pg4ml -v 14; # install for PG 14Create this extension with:
CREATE EXTENSION pg4ml CASCADE; -- requires plpgsql, tablefunc, cube, plpython3uUsage
pg4ml: Machine learning framework for PostgreSQL. Source: README.md
pg4ml is a PostgreSQL extension that implements a machine learning framework entirely within the database using PL/pgSQL and PL/Python. It provides matrix operations, neural network construction and training, clustering algorithms, and scientific computing – all through SQL.
Prerequisites
- PostgreSQL >= 14 with Python3 support
- Required extensions:
plpgsql,tablefunc,cube,plpython3u
Getting Started
CREATE EXTENSION pg4ml CASCADE;
-- This will also create the required dependencies: plpgsql, tablefunc, cube, plpython3uFeatures
Matrix Operations
The framework provides a comprehensive matrix operation library under the sm_sc schema:
- Element-wise operations: arithmetic, comparison, rounding, concatenation, boolean, bitwise, complex number, and broadcast operations
- Matrix operations: multiplication, transpose, flip, rotate, concatenation
- Construction: sampling, replacement, padding, character matching, random generation
- Trigonometric functions: broadcast operations on matrices
- Aggregation: slice-level aggregation, matrix-level aggregation, sorting by slice values, locating extremum positions
Slice Aggregation Examples
Average over vertical slices (groups of 2):
SELECT sm_sc.fv_aggr_slice_avg(
array[[1.5, 11.5],
[2.1, 12.1],
[3.3, 13.3],
[4.3, 14.3],
[5.5, 15.5],
[6.1, 16.1]],
array[2, 1]
);
-- Returns: array[[1.8, 11.8],[3.8, 13.8],[5.8, 15.8]]Max pooling over 2x3 blocks:
SELECT sm_sc.fv_aggr_slice_max(
array[[2.3, 5.1, 8.2, 2.56, 3.33, -1.9],
[3.25, 6.4, 6.6, 6.9, -2.65, -4.6],
[-2.3, 5.1, -8.2, 2.56, -3.33, -1.9],
[3.25, -6.4, -6.6, 6.9, -2.65, -4.6]],
array[2, 3]
);
-- Returns: array[[8.2, 6.9],[5.1, 6.9]]Neural Networks
The framework supports deep neural network construction and training:
- Node and Path tables:
sm_sc.tb_nn_node/sm_sc.tb_nn_pathfor defining network structure - Training input buffer:
sm_sc.tb_nn_train_input_bufffor receiving training data - Task management:
sm_sc.tb_classify_taskfor deploying and managing training tasks - Activation functions, convolution, pooling, lambda operations
- Loss functions, derivative computation, backpropagation
- Inference:
sm_sc.ft_nn_in_outfor running test/validation data through a trained model
Clustering
- K-means++: via
sm_sc.prc_kmeans_ppprocedure - DBSCAN: via
sm_sc.prc_dbscan_ppprocedure
Both use sm_sc.tb_cluster_task for task deployment and management.
Scientific Computing
- Waveform processing
- Computational graph JSON serialization/deserialization
- Complex number operations
- Linear algebra
Performance Tips
- Enable debug mode with:
SET session pg4ml._v_is_debug_check = '1'; - Matrix multiplication uses
plpython3uto call numpy for optimization - Adjust PostgreSQL parallel parameters for multi-threaded training:
max_parallel_workers_per_gatherforce_parallel_modeparallel_setup_cost,parallel_tuple_cost
- Consider using
pg_stromextension for GPU acceleration