PyIceberg

PyIceberg is a Python implementation for accessing Iceberg tables, without the need of a JVM.

Install

You can install the latest release version from pypi:

pip3 install "pyiceberg[s3fs,hive]"

Install it directly for Github (not recommended), but sometimes handy:

pip install "git+https://github.com/apache/iceberg.git#subdirectory=python&egg=pyiceberg[s3fs]"

Or clone the repository for local development:

git clone https://github.com/apache/iceberg.git
cd iceberg/python
pip3 install -e ".[s3fs,hive]"

You can mix and match optional dependencies depending on your needs:

Key Description:
hive Support for the Hive metastore
glue Support for AWS Glue
pyarrow PyArrow as a FileIO implementation to interact with the object store
duckdb Installs both PyArrow and DuckDB
s3fs S3FS as a FileIO implementation to interact with the object store
adlfs ADLFS as a FileIO implementation to interact with the object store
snappy Support for snappy Avro compression

You either need to install s3fs, adlfs or pyarrow for fetching files.

There is both a CLI and Python API available.