hdfs_fdw
hdfs_fdw : foreign-data wrapper for remote hdfs servers
Overview
| ID | Extension | Package | Version | Category | License | Language |
|---|---|---|---|---|---|---|
| 8740 | hdfs_fdw
|
hdfs_fdw
|
2.3.3 |
FDW
|
BSD 3-Clause
|
C
|
| Attribute | Has Binary | Has Library | Need Load | Has DDL | Relocatable | Trusted |
|---|---|---|---|---|---|---|
--s-d--
|
No
|
Yes
|
No
|
Yes
|
no
|
no
|
| Relationships | |
|---|---|
| See Also | pg_parquet
mongo_fdw
kafka_fdw
wrappers
multicorn
jdbc_fdw
aws_s3
duckdb_fdw
|
Packages
| Type | Repo | Version | PG Major Compatibility | Package Pattern | Dependencies |
|---|---|---|---|---|---|
| EXT | PGDG
|
2.3.3 |
18
17
16
15
14
|
hdfs_fdw |
- |
| RPM | PGDG
|
2.3.3 |
18
17
16
15
14
|
hdfs_fdw_$v |
- |
| Linux / PG | PG18 | PG17 | PG16 | PG15 | PG14 |
|---|---|---|---|---|---|
el8.x86_64
|
PGDG 2.3.3
|
PGDG 2.3.3
|
PGDG 2.3.3
|
PGDG 2.3.3
|
PGDG 2.3.3
|
el8.aarch64
|
PGDG 2.3.3
|
PGDG 2.3.3
|
PGDG 2.3.3
|
PGDG 2.3.3
|
PGDG 2.3.3
|
el9.x86_64
|
PGDG 2.3.3
|
PGDG 2.3.3
|
PGDG 2.3.3
|
PGDG 2.3.3
|
PGDG 2.3.3
|
el9.aarch64
|
PGDG 2.3.3
|
PGDG 2.3.3
|
PGDG 2.3.3
|
PGDG 2.3.3
|
PGDG 2.3.3
|
el10.x86_64
|
PGDG 2.3.3
|
PGDG 2.3.3
|
PGDG 2.3.3
|
PGDG 2.3.3
|
PGDG 2.3.3
|
el10.aarch64
|
PGDG 2.3.3
|
PGDG 2.3.3
|
PGDG 2.3.3
|
PGDG 2.3.3
|
PGDG 2.3.3
|
d12.x86_64
|
MISS
|
MISS
|
MISS
|
MISS
|
MISS
|
d12.aarch64
|
MISS
|
MISS
|
MISS
|
MISS
|
MISS
|
d13.x86_64
|
MISS
|
MISS
|
MISS
|
MISS
|
MISS
|
d13.aarch64
|
MISS
|
MISS
|
MISS
|
MISS
|
MISS
|
u22.x86_64
|
MISS
|
MISS
|
MISS
|
MISS
|
MISS
|
u22.aarch64
|
MISS
|
MISS
|
MISS
|
MISS
|
MISS
|
u24.x86_64
|
MISS
|
MISS
|
MISS
|
MISS
|
MISS
|
u24.aarch64
|
MISS
|
MISS
|
MISS
|
MISS
|
MISS
|
| Package | Version | OS | ORG | SIZE | File URL |
|---|---|---|---|---|---|
hdfs_fdw_18 |
2.3.3 |
el8.x86_64 | pgdg | 116.2 KiB | hdfs_fdw_18-2.3.3-1PGDG.rhel8.x86_64.rpm |
hdfs_fdw_18 |
2.3.3 |
el8.aarch64 | pgdg | 113.2 KiB | hdfs_fdw_18-2.3.3-1PGDG.rhel8.aarch64.rpm |
hdfs_fdw_18 |
2.3.3 |
el9.x86_64 | pgdg | 116.4 KiB | hdfs_fdw_18-2.3.3-1PGDG.rhel9.x86_64.rpm |
hdfs_fdw_18 |
2.3.3 |
el9.aarch64 | pgdg | 114.2 KiB | hdfs_fdw_18-2.3.3-1PGDG.rhel9.aarch64.rpm |
hdfs_fdw_18 |
2.3.3 |
el10.x86_64 | pgdg | 116.9 KiB | hdfs_fdw_18-2.3.3-1PGDG.rhel10.x86_64.rpm |
hdfs_fdw_18 |
2.3.3 |
el10.aarch64 | pgdg | 115.6 KiB | hdfs_fdw_18-2.3.3-1PGDG.rhel10.aarch64.rpm |
Source
Install
Make sure PGDG repo available:
pig repo add pgdg -u # add pgdg repo and update cacheInstall this extension with pig:
pig install hdfs_fdw; # install via package name, for the active PG version
pig install hdfs_fdw -v 18; # install for PG 18
pig install hdfs_fdw -v 17; # install for PG 17
pig install hdfs_fdw -v 16; # install for PG 16
pig install hdfs_fdw -v 15; # install for PG 15
pig install hdfs_fdw -v 14; # install for PG 14Create this extension with:
CREATE EXTENSION hdfs_fdw;Usage
Create Server
CREATE EXTENSION hdfs_fdw;
CREATE SERVER hdfs_server FOREIGN DATA WRAPPER hdfs_fdw
OPTIONS (host '127.0.0.1', port '10000', client_type 'hiveserver2');Server Options: host (default localhost), port (default 10000), client_type (hiveserver2 or spark, default hiveserver2), auth_type (NOSASL or LDAP), connect_timeout (default 300), fetch_size (default 10000), log_remote_sql (default false), use_remote_estimate (default false), enable_join_pushdown (default true), enable_aggregate_pushdown (default true), enable_order_by_pushdown (default true).
Create User Mapping
CREATE USER MAPPING FOR postgres SERVER hdfs_server
OPTIONS (username 'hive_user', password 'hive_password');For NOSASL authentication, omit the OPTIONS clause entirely.
Create Foreign Table
CREATE FOREIGN TABLE weblogs (
client_ip text,
http_status_code text,
uri text,
request_count bigint
)
SERVER hdfs_server
OPTIONS (dbname 'default', table_name 'weblogs');Table Options: dbname (default default), table_name (defaults to foreign table name), enable_join_pushdown, enable_aggregate_pushdown, enable_order_by_pushdown.
Query
SELECT client_ip, count(*) FROM weblogs GROUP BY client_ip ORDER BY count(*) DESC LIMIT 10;Spark Example
CREATE SERVER spark_server FOREIGN DATA WRAPPER hdfs_fdw
OPTIONS (host '127.0.0.1', port '10000', client_type 'spark');
CREATE USER MAPPING FOR postgres SERVER spark_server
OPTIONS (username 'spark_user', password 'spark_pass');
CREATE FOREIGN TABLE spark_table (
id int,
name text,
value double precision
)
SERVER spark_server
OPTIONS (dbname 'default', table_name 'my_table');Pushdown Features
hdfs_fdw pushes down WHERE clauses, JOINs, aggregate functions, ORDER BY, and LIMIT/OFFSET to the remote Hive/Spark server. Control pushdown at the session level:
SET hdfs_fdw.enable_join_pushdown = on;
SET hdfs_fdw.enable_aggregate_pushdown = on;
SET hdfs_fdw.enable_order_by_pushdown = on;
SET hdfs_fdw.enable_limit_pushdown = on;