datasketches
datasketches
datasketches : Approximate analytics sketches and aggregates for PostgreSQL
Overview
| ID | Extension | Package | Version | Category | License | Language |
|---|---|---|---|---|---|---|
| 4690 | datasketches
|
datasketches
|
1.7.0 |
FUNC
|
Apache-2.0
|
C++
|
| Attribute | Has Binary | Has Library | Need Load | Has DDL | Relocatable | Trusted |
|---|---|---|---|---|---|---|
--s-d-r
|
No
|
Yes
|
No
|
Yes
|
yes
|
no
|
Built against Apache DataSketches C++ core 5.0.0.
Packages
| Type | Repo | Version | PG Major Compatibility | Package Pattern | Dependencies |
|---|---|---|---|---|---|
| EXT | PIGSTY
|
1.7.0 |
18
17
16
15
14
|
datasketches |
- |
| RPM | PIGSTY
|
1.7.0 |
18
17
16
15
14
|
datasketches_$v |
- |
| DEB | PIGSTY
|
1.7.0 |
18
17
16
15
14
|
postgresql-$v-datasketches |
- |
| Linux / PG | PG18 | PG17 | PG16 | PG15 | PG14 |
|---|---|---|---|---|---|
el8.x86_64
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
el8.aarch64
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
el9.x86_64
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
el9.aarch64
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
el10.x86_64
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
el10.aarch64
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
d12.x86_64
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
d12.aarch64
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
d13.x86_64
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
d13.aarch64
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
u22.x86_64
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
u22.aarch64
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
u24.x86_64
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
u24.aarch64
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
PIGSTY 1.7.0
|
Source
pig build pkg datasketches; # build rpm/debInstall
Make sure PGDG and PIGSTY repo available:
pig repo add pgsql -u # add both repo and update cacheInstall this extension with pig:
pig install datasketches; # install via package name, for the active PG version
pig install datasketches -v 18; # install for PG 18
pig install datasketches -v 17; # install for PG 17
pig install datasketches -v 16; # install for PG 16
pig install datasketches -v 15; # install for PG 15
pig install datasketches -v 14; # install for PG 14Create this extension with:
CREATE EXTENSION datasketches;Usage
Sources: README, Apache DataSketches site PostgreSQL extension for approximate analytics sketches and aggregates.
CREATE EXTENSION datasketches;The extension supports CPC, HLL, Theta, Array Of Doubles, KLL, Quantiles, and Frequent Strings sketches.
Sketch Families
- CPC for compact distinct counting.
- HLL for HyperLogLog-style distinct counting.
- Theta for distinct counting with set operations such as union, intersection, and A-not-B.
- Array Of Doubles for tuple sketches with arrays of double values per key.
- KLL for quantiles, ranks, PMF, and CDF estimation.
- Quantiles sketch for long-term support of distribution estimates.
- Frequent strings for tracking the heaviest items by count or weight.
Examples
SELECT cpc_sketch_to_string(cpc_sketch_build(1));
SELECT cpc_sketch_distinct(id) FROM random_ints_100m;
SELECT cpc_sketch_get_estimate(cpc_sketch_union(sketch)) FROM cpc_sketch_test;
SELECT theta_sketch_get_estimate(theta_sketch_union(sketch)) FROM theta_sketch_test;
SELECT theta_sketch_get_estimate(theta_sketch_intersection(sketch1, sketch2)) FROM theta_set_op_test;
SELECT hll_sketch_get_estimate(hll_sketch_union(sketch)) FROM hll_sketch_test;
SELECT hll_sketch_get_estimate(hll_sketch_union(hll_sketch_build(1), hll_sketch_build(2)));
SELECT kll_float_sketch_get_quantile(kll_float_sketch_merge(sketch), 0.5) FROM kll_float_sketch_test;
SELECT frequent_strings_sketch_result_no_false_negatives(frequent_strings_sketch_build(9, value), 1000000) FROM zipf_1p1_8k_100m;Core Operations
- Build sketches with
*_sketch_build(...). - Merge or aggregate them with
*_sketch_union(...),*_sketch_merge(...), and sketch-specific set-operation helpers. - Read estimates with
*_sketch_get_estimate(...)and distribution helpers such askll_float_sketch_get_quantile(...).
Notes
- The README says the extension targets PostgreSQL 9.6 and higher and depends on Boost 1.75 and DataSketches C++ core 5.0.0 or later.
- The upstream examples emphasize additive analytics in data cubes, not exact replacement for normal aggregates.
Last updated on