Context
Recently, we’re planing to use Pulumi to manage all current existing AWS Glue Datacatalog tables which are Iceberg format. For the Iceberg tables, I have post a blog before to talk about what is iceberg and what’s feature of it. Here is post link: https://stonefishy.github.io/2020/05/23/what-is-apache-iceberg/
To manage the AWS Glue Iceberg tables with Pulumi, due to our catalog table schemas are continue changes base on requirements. We need to do some technical POC whethere the pulumi can also support to update the iceberg metadata schema as well.
Create Glue Iceberg Table
We’re using Pulumi to manage the AWS Cloud Infrastructure. Before create glue table, a glue database is indeed.
1 | import pulumi |
Above code is to create a glue database named pulumi_database_test. Next Step is to create a glue table with Iceberg format.
1 | import pulumi |
There is important thing to notice here is that we need to set open_table_format_input with iceberg_input and set metadata_operation as CREATE. This is because we want to create a new Iceberg table with new schema.
Below is glue iceberg table created screenshot. You can see the 4 fields is added in schema and table format is Apache Iceberg.

Next, let’s check the important file that is Apache Iceberg metadata file which is located in s3://xxx/pulumi_external_table_test/metadata/. Download this json file 00006-fd122b03-a7aa-42cf-8fec-001535a9fcf5.metadata.json from S3. The 4 fields are defined in metadata json file. That is good. The metadata json is created as well when creating glue table.

Insert new data in Glue iceberg table
Let’s using AWS Athena to insert a test data in the table.
1 | INSERT INTO pulumi_database_test.pulumi_external_table_test(test1,test2,test3,test4) VALUES('1a', '2a', true, '4a') |
The data is insert success and we can use SELECT sql to query the data.

In Iceberg table, we can insert, update, delete data as well.
Update Glue Iceberg table schema
Let’s add a new field test5 in the glue iceberg table base on previous code.
1 | import pulumi |
Execute pulumi up command to update the glue table schema.

After that, we can check the glue table schema is updated to add a new field test5.

Let’s insert new data in the table with new field test5 and run it in AWS Athena.
1 | INSERT INTO pulumi_database_test.pulumi_external_table_test(test1,test2,test3,test4,test5) VALUES('1b', '2b', true, '4b', '5b') |
The Athena execute show below errors:
1 | COLUMN_NOT_FOUND: Insert column name does not exist in target table: test5. If a data manifest file was generated at 's3://xxxxxx/4c346103-60d2-45ea-9813-d7060bd5efe9/Unsaved/2024/10/09/37f67a67-4604-43cb-b113-af351c363a51-manifest.csv', you may need to manually clean the data from locations specified in the manifest. Athena will not delete data in your account. |

But when we check the Apache Iceberg metadata file again. The new field test5 is not added in the new metadata file. That’s why the insert new data with new field failed.

Conclusion
In Pulumi documentation. The metadata_operation of iceberg_input in open_table_format_input is only support CREATE value. It seems it only can create the iceberg metadata file when glue table created.

It seems this is pulumi issue. It is not updating the iceberg metadata file when the glue table schema is updated. I’ve raised a issue to pulumi, here is issue link: https://github.com/pulumi/pulumi/issues/17516. Hope this issue can be fixed soon.
Mean while, I found there is same issue in Terraform which also can not update the iceberg metadata file when the glue table schema is updated. Terraform issue link here https://github.com/hashicorp/terraform-provider-aws/issues/36641.