3

I am using Spark Structured Streaming (3.1.1) to read data from Kafka and use HUDI (0.8.0) as the storage system on S3 partitioning the data by date. (no problems with this section)

I am looking to use Trino (355) to be able to query that data. As a pre-curser, I've already placed the hudi-presto-bundle-0.8.0.jar in /data/trino/hive/

I created a table with the following schema

CREATE TABLE table_new (
  columns, dt
) WITH (
  partitioned_by = ARRAY['dt'], 
  external_location = 's3a://bucket/location/',
  format = 'parquet'
);

Even after calling the below function, trino is unable to discover any partitions

CALL system.sync_partition_metadata('schema', 'table_new', 'ALL')

My assessment is that I am unable to create a table under trino using hudi largely due to the fact that I am not able to pass the right values under WITH Options. I am also unable to find a create table example under documentation for HUDI.

I would really appreciate if anyone can give me a example for that, or point me to the right direction, if in case I've missed anything.

Really appreciate the help


Small Update: Tried Adding

connector = 'hudi'

but this throws the error:

Catalog 'hive' does not support table property 'connector'
gunj_desai
  • 782
  • 6
  • 19
  • Do you get any output when running sync_partition_metadata? You should verify you are pointing to a catalog either in the session or our url string. – Brian Olsen Jan 05 '22 at 05:30
  • @BrianOlsen no output at all when i call sync_partition_metadata. Also when logging into trino-cli i do pass the parameter `--catalog hive` – gunj_desai Jan 05 '22 at 10:00

2 Answers2

0

Have you tried below? enter image description here

Reference: https://hudi.apache.org/docs/next/querying_data/#trino https://hudi.apache.org/docs/query_engine_setup/#PrestoDB

  • yes, i did actaully, the documentation primarily revolves around querying data and not how to create a table, hence looking for an example if possible – gunj_desai Dec 23 '21 at 22:29
0

As of this writing, you can only query HUDI tables with Trino/Presto. Creating tables is not supported, so as inserting/updating data.

Check out the Writing Data document where it mentions Spark and Flink are the only engine with write. Ref: https://hudi.apache.org/docs/writing_data

Xuan Huy
  • 21
  • 3