Questions tagged [iceberg]

Apache Iceberg is an open table format for huge analytic datasets. Iceberg adds tables to Presto and Spark that use a high-performance format that works just like a SQL table. Use this tags for any questions relating to support for or usage of Iceberg.

134 questions
5
votes
2 answers

Missing hive dependency issues with Apache IceBerg

I'm trying to use Apache IceBerg for writing data to a specified location(S3/local). Following is the configuration used below. SBT: libraryDependencies += "org.apache.spark" %% "spark-sql" % "3.2.1" % "provided", libraryDependencies +=…
Leroy Mikenzi
  • 792
  • 6
  • 22
  • 46
4
votes
0 answers

How to deploy Iceberg tables to AWS through Terraform

trying to determine the best ways to deploy some Iceberg tables into our AWS environment. Has anyone had success via Terraform? I have the following configuration, but Athena complains of lacking metadata location (or will just spin forever) when I…
3
votes
1 answer

Avoid shuffling when inserting into sorted iceberg table

I have an Iceberg table created with CREATE TABLE catalog.db.table (a int, b int) USING iceberg Then I apply some sort order on it ALTER TABLE catalog.db.table WRITE ORDERED BY (a, b) After invoking the last command, SHOW TBLPROPERTIES…
Ixanezis
  • 1,631
  • 13
  • 20
3
votes
0 answers

Problem to merge multiple streamings in the same table on apache iceberg

I have multiple spark streaming writing in the same table in diferents fields. The iceberg documentation said the following: Iceberg supports multiple concurrent writes using optimistic concurrency. But the error message appear when trying to…
Alan Miranda
  • 125
  • 1
  • 8
3
votes
0 answers

Spark Iceberg - Merge Into Issue - Caused by: org.apache.spark.sql.AnalysisException: unresolved operator 'ReplaceIcebergData RelationV2

I am trying to upsert records to iceberg using Spark merge into feature, I am using spark 3.3.0 with iceberg 0.14.0. Merge Into - USING [db_name.]source_table [] [AS source_alias] ON [ WHEN MATCHED [ AND…
RakeshV
  • 444
  • 3
  • 11
3
votes
1 answer

why Iceberg rewriteDataFiles doesn't rewrite the files to one file?

I have an iceberg table with 2 parquets files store 4 rows in s3 I tried the following command: val tables = new HadoopTables(conf); val table = tables.load("s3://iceberg-tests-storage/data/db/test5"); …
eweiss
  • 51
  • 7
3
votes
1 answer

How to choose partition keys for apache iceberg tables

I have a number of hive warehouses. The data resides in parquet files in Amazon S3. Some of the tables contain TB of data. Currently in hive most tables are partitioned by a combination of month and year, both of which are saved mainly as string.…
E. Erfan
  • 1,239
  • 19
  • 37
3
votes
2 answers

How to add partitioning to existing Iceberg table

How to add partitioning to existing Iceberg table which is not partitioned? Table is loaded with data already. Table was created: import org.apache.iceberg.hive.HiveCatalog import org.apache.iceberg.catalog._ import…
domisj
  • 31
  • 1
  • 2
2
votes
0 answers

performant writes to apache Iceberg

I've been sitting 2+ weeks on the topic of trying to achieve performant record writes from pandas (or ideally polars if possible) in python environment to our apache iceberg deployment (with hive metastore) directly, or via Trino query engine based…
Paul
  • 756
  • 1
  • 8
  • 22
2
votes
2 answers

PySpark read Iceberg table, via hive metastore onto S3

I'm trying to interact with Iceberg tables stored on S3 via a deployed hive metadata store service. The purpose is to be able to push-pull large amounts of data stored as an Iceberg datalake (on S3). Couple of days further, documentation, google,…
Paul
  • 756
  • 1
  • 8
  • 22
2
votes
1 answer

Is there a way to remove files belongs to a partition without physically delete them in iceberg?

there is add_files() to add some files from hive table to iceberg. but cannot find a way to reverse that operation other than drop the table and recreate. CALL spark_catalog.system.add_files( table => 'db.tbl', source_table =>…
Dyno Fu
  • 8,753
  • 4
  • 39
  • 64
2
votes
1 answer

Iceberg on kubernetes, rest container problem

I'm trying to run iceberg on kubernetes. Here are the files that I'm using: apiVersion: apps/v1 kind: Deployment metadata: annotations: kompose.cmd: kompose convert kompose.version: 1.27.0 (b0ed6a2c9) creationTimestamp: null labels: …
2
votes
0 answers

How to List Iceberg Tables in a Catalog

I'm trying to just list all tables in my Iceberg-enabled catalog. Falling back to Spark-Sql works spark.sql(s"USE ${catalogName}.${databaseName}") val tables = spark.sql("SHOW TABLES") Is it possible to accomplish the same with meta classes…
zachd1_618
  • 4,210
  • 6
  • 34
  • 47
2
votes
0 answers

Apache Iceberg on Redshift Spectrum, is it possible?

I have seen here https://aws.amazon.com/about-aws/whats-new/2020/09/amazon-redshift-spectrum-adds-support-for-querying-open-source-apache-hudi-and-delta-lake/ that Redshift Spectrum has support for Hudi and Delta. We're using Iceberg right now as a…
2
votes
1 answer

How do I specify equality versus position deletes when using merge-on-read?

The iceberg documentation discusses using merge-on-read when deleting data. The documentation also refers to doing position deletes versus equality deletes. It seems straight forward to specify that I want merge-on-read in the table properties. I've…
1
2 3
8 9