AWS · Open Table Format / Data Lakehouse
Apache Iceberg
Built for cloud teams
Recover Apache Iceberg tables to any point in time — without managing backup infrastructure. Iceberg-aware, air-gapped, and API-first from day one.
Storage tier
Restore type
Cross-region
Cross-account
01 · Why Clumio
Why pick Clumio for Iceberg
Native Iceberg snapshots solve versioning. They don’t solve cyber resilience, multi-year retention, or cross-account recovery. Clumio offers the protection model that the lakehouse leaves to you.
ICEBERG-AWARE
Restores are queryable on arrival
Backups are Iceberg-aware, capturing data files, metadata, and the snapshot chain together while preserving schema specifications end-to-end. Restored tables come back queryable as soon as the catalog updates, with no metadata repair or configuration changes.
INCREMENTAL
Incremental-forever on snapshot lineage
The seed helps transfers the full snapshot chain; subsequent runs only carry deltas. Tighter RPO allows for increased snapshot churn.
BACKTRACK
In-place restore, no table rewiring
Roll the source Iceberg table back to an earlier snapshot in place. Table identity (catalog name, ARN, IAM, downstream references) stays intact. Operates on the Iceberg manifest chain rather than recreating the table.
FLEXIBLE RECOVERY
Pick the right recovery shape
Restore a full table, one or multiple selected snapshots, or a specific point in time. Preview schema on every snapshot so you can validate the restore target matches what you expect. You can even change the top snapshot version to one of your choosing.
NO LAKE IMPACT
Years of retention, zero source overhead
Vault retention is independent of the source table’s snapshot history. Drop snapshot history in the lake without losing the ability to restore from a year ago.
MIGRATION PATH
Migrate Glue and S3 Tables in either direction
Restore Glue-managed Iceberg backups into Amazon S3 Tables, or restore S3 Tables backups into a Glue catalog. Clumio helps update metadata to the target convention during restore; the table is queryable rapidly, with minimal to no manual reconciliation needed.
New to Clumio?
Set up your AWS account first
This page assumes a connected AWS account with at least one Iceberg table. If you haven’t done that yet, the Getting Started guide will walk you through sign-up, account connection, and first backup quickly.
02 · Backup
How to back up Iceberg
Apply one Clumio policy to your Iceberg tables, whether they live in the AWS Glue Data Catalog or in Amazon S3 Tables. The policy defines schedule, retention, tier, and region. (For initial AWS account setup, see the Getting Started guide.)
Create a backup policy
A policy helps define schedule, retention, target region, and tier. Iceberg policies operate at the table level. Each backup captures the table’s data files, metadata files, and manifest list together, so the recovery point represents a transactionally consistent Iceberg snapshot rather than a smeared crawl.
Protect → Backup policies →
Pick the right RPO
The policy schedule sets the cadence (e.g., daily, weekly), with an optional start time. The seed transfers the full snapshot chain; subsequent runs are incremental-forever relative to Iceberg snapshot lineage. Tighter RPO allows for increased snapshot churn.
Choose a tier (SecureVault Standard)
Iceberg backups land in SecureVault Standard, an air-gapped tier outside your source account, with fast restore and the full set of restore granularities. The same tier covers Iceberg tables stored in either of two AWS surfaces. Apply one policy to both surfaces; the workload model is identical, only the underlying metadata location differs.
AWS Glue Data Catalog
Iceberg tables registered to a Glue database, with data and manifest files in general-purpose S3. Common shape for teams already on Glue + Athena.
Amazon S3 Tables
Fully managed Iceberg in S3 Tables buckets. AWS handles compaction, snapshot management, and unreferenced-file cleanup; Clumio doesn’t disturb that maintenance.
Choose a region (in-region or out-of-region)
By default, Clumio stores backups in the same region as the source table. You can target a different region for cross-region durability, which adds AWS data transfer cost. The destination is set on the policy.
Apply the policy with protection rules
Once the policy is saved, use protection rules to apply it to Iceberg tables. For Glue Data Catalog tables, target individual tables directly. For Amazon S3 Tables, target by AWS tag, account, name, or region. Tables can also be excluded by tag for fine-grained control. The seed backup runs first; subsequent backups are incremental.
03 · Restore
How to restore Iceberg
An Iceberg restore comes down to three choices: when to recover from, what to recover, and where it lands.
WHENPick the recovery point
An exact timestamp from the retention window, or a discrete snapshot from the calendar.
Pick any timestamp within the retention window. Clumio walks back to the Iceberg snapshot that was current at that instant. Useful for “rewind to right before the bad data” recoveries; no need to know snapshot IDs.
Pick a discrete backup from the table’s calendar and drill into it. Inspect the Iceberg snapshots captured in that backup, plus prior snapshots still retained in air-gapped storage; preview the schema, size, and the operation that produced each one (append, overwrite, delete).
WHATPick the granularity
From a full table down to a single snapshot, with the same workflow.
All retained, air-gapped snapshots for the table land together, with data files, metadata, and the full snapshot manifest chain preserved end-to-end. The default and most common shape.
Restore a single snapshot or a chosen subset of the lineage. Useful for rebuilding the table at a specific point, replaying forward without older history, or updating the head snapshot of an existing table to roll it forward or back.
WHEREPick the destination
Restore back into the source table or land somewhere else entirely.
Roll the source Iceberg table back to an earlier snapshot in the same catalog, same account. Table identity (catalog name, ARN, IAM, downstream references) stays intact. Operates on the Iceberg manifest chain rather than recreating the table.
Land the table in a different AWS account, region, Glue database, or S3 Tables bucket. Cross-account is the recovery path after a source-account compromise, since the restore target sits outside the blast radius. Same workflow can double as a migration path between Glue and S3 Tables in either direction.
05 · Common questions
Frequently asked questions
Questions from engineers setting up Iceberg protection or troubleshooting restores.
Are Clumio Iceberg backups incremental, or does every backup capture the full table?
Backups are incremental at the snapshot level. Each backup can help walks the table’s snapshot lineage, finds the snapshots that landed since the prior backup, and captures their new data and metadata files. Older snapshots already in air-gapped storage are referenced rather than re-uploaded, and the full manifest chain is preserved so every backup can remains a complete, restorable point in time.
Does Clumio support Iceberg tables in both AWS Glue Catalog and Amazon S3 Tables, and which engines can query a restored table?
Yes, for Iceberg tables specifically (non-Iceberg Glue tables are not in scope). Both surfaces are first-class: the same backup policy applies to either, and tables can be restored across them as a migration path. After restore, the table comes back as a valid Iceberg table at the destination catalog, queryable by Athena, EMR Spark, Trino, Snowflake, and any other Iceberg-compatible engine with minimal or noout manual metadata repair.
Which Apache Iceberg spec versions does Clumio support?
Clumio supports Iceberg format v1 and v2 tables today. Position deletes, equality deletes, and the row-level operations introduced in v2 are captured in backups and restored intact. Support for v3 is coming soon.
Can I back up only compacted snapshots and skip the intermediate write churn?
By default, each backup captures all available snapshots in the table’s lineage, so the protected history mirrors the source. New filtering modes are in development that are designed to let policies capture only the latest snapshot at backup time, or only the compacted snapshot (skipping the intermediate write-heavy ones), for teams that don’t need every intermediate state retained.
Where do the restored data files land?
You choose at restore time. In-place restore writes data files back into the source table’s S3 location and updates the source Glue or S3 Tables catalog entry. Out-of-place restore writes to a destination bucket and catalog you specify (different table name, different database, different account, or different region). The destination must be reachable from a Clumio connector in the target account.
06 · Related resources
Go deeper
Blog posts and reference material for teams building on Clumio Iceberg protection.
Blog
Simplifying Your Migration to Amazon S3 Tables with Clumio
How Clumio’s restore flow can doubles as an Iceberg-aware migration path from Glue-managed Iceberg into fully managed Amazon S3 Tables. Preserves metadata, schema, and snapshot lineage.
Solution brief
Clumio for Apache Iceberg on AWS
Two-page positioning brief. Why native Iceberg snapshots fall short for cyber resilience, and how Clumio’s Iceberg-aware, air-gapped architecture closes those gaps. Includes the IDC analyst take.
Demo
Protect Apache Iceberg Tables with Air-Gapped Clumio Backups
Backup-side demo. Walks through Iceberg-aware backup architecture, the case for air-gapped protection over native snapshots, policy creation, inventory and calendar views, and snapshot inspection.
Demo
Clumio – Apache Iceberg Restore
Companion restore demo. Walks through restoring AWS Glue-based and S3-based Iceberg tables so they come back fully operational and queryable immediately. Minimal to no manual metadata repair or extra reconciliation.