How to determine if source is empty?

Question

I have an etl process that is using an athena source. I cannot figure out how to create a data frame if there is no data yet in the source. I was using the GlueContext:

trans_ddf = glueContext.create_dynamic_frame.from_catalog(
        database=my_db, table_name=my_table, transformation_ctx="trans_ddf")

This fails if there is no data in the source db, because it can't infer the schema.

I also tried using the sql function on the spark session:

has_rows_df = spark.sql("select cast(count(*) as boolean) as hasRows from my_table limit 1")
has_rows = has_rows_df.collect()[0].hasRows

This also fails because it can't infer the schema.

How can I create a data frame so I can determine if the source has any data?

When you say no data in source are you referring to no data in s3 location or no data because of bookmarking on your job? — Prabhakar Reddy, Jul 31 '19 at 01:52
Can you add the error message too, also just to understand you just want to know if the `has_rows_df` has any records after the SparkSQL is run ? — DataWrangler, Jul 31 '19 at 05:56
@Prabhakar, by source is empty I mean there are no files in s3, no partitioned folders...the source folder is empty. — Greg McGuffey, Jul 31 '19 at 14:40
@Joby, I mean that the results of executing the query will return one row that is true or false. — Greg McGuffey, Jul 31 '19 at 14:40
@GregMcGuffey ok.Can you try using a boto3 call to issue get_partitions API call which will return partitions and if there are no partitions then you can safely assume that there is no data.If the table is not partitioned then you need to get table location from output of get_table and issue one more boto3 API call to verify if there is any data present at all.Let me know after trying this. — Prabhakar Reddy, Aug 01 '19 at 00:59

score 0 · Answer 1 · answered Nov 07 '19 at 15:02

0

has_rows_df.head(1).isEmpty

should do the job,robustly. See How to check if spark dataframe is empty?

answered Nov 07 '19 at 15:02

Nicus

86
7

Unfortunately the OP can't even create the dataframe, so couldn't get to a point where this would be usable. – Davos Nov 22 '19 at 05:18

How to determine if source is empty?

1 Answers1