write spark dataframe schema to a separate file

Asked Jun 08 '23 at 15:23

Active Jun 08 '23 at 18:31

Viewed 31 times

Sorry to bother others. We need to write spark dataframe to a separate file. It should be something like this:

df.other_operations.
val schema = df.schema.toDDL
val putObjectRequest = PutObjectRequest
      .builder()
      .bucket(bucket)
      .key(key)
      .build()
s3.s3Client.putObject(putObjectRequest, RequestBody.fromString(schema))

Questions:

Our df were mostly from reading files use spark native APIs. Do we need to cache df before reading the schema property? From here, to read some stats we need to cache() the dataframe. We try to avoid it but allow spark to decide everything.
If I do something above, it will only dump schema once, not for each partition. Am I right? I actually want a way to pass this information to driver and let driver dump it only once. But Idk which way is best. Accumulator seems overkill.

Thanks.

edited Jun 08 '23 at 18:31

asked Jun 08 '23 at 15:23

user2988877

write spark dataframe schema to a separate file

0 Answers0