PyIceberg
PyIceberg is a Python implementation for accessing Iceberg tables, without the need of a JVM.
Install
You can install the latest release version from pypi:
pip3 install "pyiceberg[s3fs,hive]"
Install it directly for Github (not recommended), but sometimes handy:
pip install "git+https://github.com/apache/iceberg.git#subdirectory=python&egg=pyiceberg[s3fs]"
Or clone the repository for local development:
git clone https://github.com/apache/iceberg.git
cd iceberg/python
pip3 install -e ".[s3fs,hive]"
You can mix and match optional dependencies depending on your needs:
Key | Description: |
---|---|
hive | Support for the Hive metastore |
glue | Support for AWS Glue |
pyarrow | PyArrow as a FileIO implementation to interact with the object store |
duckdb | Installs both PyArrow and DuckDB |
s3fs | S3FS as a FileIO implementation to interact with the object store |
adlfs | ADLFS as a FileIO implementation to interact with the object store |
snappy | Support for snappy Avro compression |
You either need to install s3fs
, adlfs
or pyarrow
for fetching files.
There is both a CLI and Python API available.