Highest Voted 'iceberg' Questions

5

votes

2 answers

Missing hive dependency issues with Apache IceBerg

I'm trying to use Apache IceBerg for writing data to a specified location(S3/local). Following is the configuration used below. SBT: libraryDependencies += "org.apache.spark" %% "spark-sql" % "3.2.1" % "provided", libraryDependencies +=…

asked Sep 01 '22 at 10:47

Leroy Mikenzi

792
6
22
46

4

votes

0 answers

How to deploy Iceberg tables to AWS through Terraform

trying to determine the best ways to deploy some Iceberg tables into our AWS environment. Has anyone had success via Terraform? I have the following configuration, but Athena complains of lacking metadata location (or will just spin forever) when I…

amazon-web-services terraform terraform-provider-aws iceberg

asked Feb 27 '23 at 14:45

DataEnginerd

51
3

3

votes

1 answer

Avoid shuffling when inserting into sorted iceberg table

I have an Iceberg table created with CREATE TABLE catalog.db.table (a int, b int) USING iceberg Then I apply some sort order on it ALTER TABLE catalog.db.table WRITE ORDERED BY (a, b) After invoking the last command, SHOW TBLPROPERTIES…

apache-spark iceberg

asked Dec 29 '22 at 13:06

Ixanezis

1,631
13
20

3

votes

0 answers

Problem to merge multiple streamings in the same table on apache iceberg

I have multiple spark streaming writing in the same table in diferents fields. The iceberg documentation said the following: Iceberg supports multiple concurrent writes using optimistic concurrency. But the error message appear when trying to…

apache-spark spark-streaming iceberg

asked Nov 16 '22 at 02:36

Alan Miranda

125
1
8

3

votes

0 answers

Spark Iceberg - Merge Into Issue - Caused by: org.apache.spark.sql.AnalysisException: unresolved operator 'ReplaceIcebergData RelationV2

I am trying to upsert records to iceberg using Spark merge into feature, I am using spark 3.3.0 with iceberg 0.14.0. Merge Into - USING [db_name.]source_table [] [AS source_alias] ON [ WHEN MATCHED [ AND…

apache-spark apache-spark-sql iceberg

asked Aug 20 '22 at 06:02

RakeshV

444
3
11

3

votes

1 answer

why Iceberg rewriteDataFiles doesn't rewrite the files to one file?

I have an iceberg table with 2 parquets files store 4 rows in s3 I tried the following command: val tables = new HadoopTables(conf); val table = tables.load("s3://iceberg-tests-storage/data/db/test5"); …

apache-spark iceberg

asked May 24 '22 at 11:26

eweiss

51
7

3

votes

1 answer

How to choose partition keys for apache iceberg tables

I have a number of hive warehouses. The data resides in parquet files in Amazon S3. Some of the tables contain TB of data. Currently in hive most tables are partitioned by a combination of month and year, both of which are saved mainly as string.…

hive iceberg

asked Dec 03 '21 at 08:50

E. Erfan

1,239
19
37

3

votes

2 answers

How to add partitioning to existing Iceberg table

How to add partitioning to existing Iceberg table which is not partitioned? Table is loaded with data already. Table was created: import org.apache.iceberg.hive.HiveCatalog import org.apache.iceberg.catalog._ import…

scala apache-spark apache-spark-sql iceberg

asked Mar 11 '20 at 11:23

domisj

31
1
2

2

votes

0 answers

performant writes to apache Iceberg

I've been sitting 2+ weeks on the topic of trying to achieve performant record writes from pandas (or ideally polars if possible) in python environment to our apache iceberg deployment (with hive metastore) directly, or via Trino query engine based…

bigdata trino iceberg apache-iceberg

asked Jul 06 '23 at 14:34

Paul

756
1
8
22

2

votes

2 answers

PySpark read Iceberg table, via hive metastore onto S3

I'm trying to interact with Iceberg tables stored on S3 via a deployed hive metadata store service. The purpose is to be able to push-pull large amounts of data stored as an Iceberg datalake (on S3). Couple of days further, documentation, google,…

pyspark hive iceberg

asked Apr 14 '23 at 11:57

Paul

756
1
8
22

2

votes

1 answer

Is there a way to remove files belongs to a partition without physically delete them in iceberg?

there is add_files() to add some files from hive table to iceberg. but cannot find a way to reverse that operation other than drop the table and recreate. CALL spark_catalog.system.add_files( table => 'db.tbl', source_table =>…

apache-spark iceberg apache-iceberg

asked Mar 16 '23 at 21:53

Dyno Fu

8,753
4
39
64

2

votes

1 answer

Iceberg on kubernetes, rest container problem

I'm trying to run iceberg on kubernetes. Here are the files that I'm using: apiVersion: apps/v1 kind: Deployment metadata: annotations: kompose.cmd: kompose convert kompose.version: 1.27.0 (b0ed6a2c9) creationTimestamp: null labels: …

kubernetes deployment iceberg

asked Jan 04 '23 at 13:03

Gerson Scheffer Scheffer-ACA

21
1

2

votes

0 answers

How to List Iceberg Tables in a Catalog

I'm trying to just list all tables in my Iceberg-enabled catalog. Falling back to Spark-Sql works spark.sql(s"USE ${catalogName}.${databaseName}") val tables = spark.sql("SHOW TABLES") Is it possible to accomplish the same with meta classes…

aws-glue iceberg

asked Dec 23 '22 at 22:15

zachd1_618

4,210
6
34
47

2

votes

0 answers

Apache Iceberg on Redshift Spectrum, is it possible?

I have seen here https://aws.amazon.com/about-aws/whats-new/2020/09/amazon-redshift-spectrum-adds-support-for-querying-open-source-apache-hudi-and-delta-lake/ that Redshift Spectrum has support for Hudi and Delta. We're using Iceberg right now as a…

amazon-web-services amazon-redshift amazon-redshift-spectrum iceberg apache-iceberg

asked Nov 10 '22 at 08:45

Mateus Leão

83
4

2

votes

1 answer

How do I specify equality versus position deletes when using merge-on-read?

The iceberg documentation discusses using merge-on-read when deleting data. The documentation also refers to doing position deletes versus equality deletes. It seems straight forward to specify that I want merge-on-read in the table properties. I've…

apache-spark iceberg

asked Nov 04 '22 at 14:13

Peter Connolly

21
1

Questions tagged [iceberg]