I have an etl process that is using an athena source. I cannot figure out how to create a data frame if there is no data yet in the source. I was using the GlueContext:
trans_ddf = glueContext.create_dynamic_frame.from_catalog(
database=my_db, table_name=my_table, transformation_ctx="trans_ddf")
This fails if there is no data in the source db, because it can't infer the schema.
I also tried using the sql function on the spark session:
has_rows_df = spark.sql("select cast(count(*) as boolean) as hasRows from my_table limit 1")
has_rows = has_rows_df.collect()[0].hasRows
This also fails because it can't infer the schema.
How can I create a data frame so I can determine if the source has any data?