Catalogs

PyIceberg currently has native support for REST, Hive and Glue.

There are three ways to pass in configuration:

  • Using the ~/.pyiceberg.yaml configuration file
  • Through environment variables
  • By passing in credentials through the CLI or the Python API

The configuration file is recommended since that's the most transparent way. If you prefer environment configuration:

export PYICEBERG_CATALOG__DEFAULT__URI=thrift://localhost:9083

The environment variable picked up by Iceberg starts with PYICEBERG_ and then follows the yaml structure below, where a double underscore __ represents a nested field.

For the FileIO there are several configuration options available:

Key Example Description
s3.endpoint https://10.0.19.25/ Configure an alternative endpoint of the S3 service for the FileIO to access. This could be used to use S3FileIO with any s3-compatible object storage service that has a different endpoint, or access a private S3 endpoint in a virtual private cloud.
s3.access-key-id admin Configure the static secret access key used to access the FileIO.
s3.secret-access-key password Configure the static session token used to access the FileIO.
s3.signer bearer Configure the signature version of the FileIO.
adlfs.endpoint http://127.0.0.1/ Configure an alternative endpoint of the ADLFS service for the FileIO to access. This could be used to use FileIO with any adlfs-compatible object storage service that has a different endpoint (like azurite).
adlfs.account-name devstoreaccount1 Configure the static storage account name used to access the FileIO.
adlfs.account-key Eby8vdM02xNOcqF... Configure the static storage account key used to access the FileIO.

REST Catalog

catalog:
  default:
    uri: http://rest-catalog/ws/
    credential: t-1234:secret

  default-mtls-secured-catalog:
    uri: https://rest-catalog/ws/
    ssl:
      client:
        cert: /absolute/path/to/client.crt
        key: /absolute/path/to/client.key
      cabundle: /absolute/path/to/cabundle.pem

Hive Catalog

catalog:
  default:
    uri: thrift://localhost:9083
    s3.endpoint: http://localhost:9000
    s3.access-key-id: admin
    s3.secret-access-key: password

Glue Catalog

If you want to use AWS Glue as the catalog, you can use the last two ways to configure the pyiceberg and refer How to configure AWS credentials to set your AWS account credentials locally.

catalog:
  default:
    type: glue