Pyspark: Read csv file with multiple sheets

Question

The .csv file I am using will have multiple sheets (Dynamic sheet names).

I have to create dataFrames for all the sheets

The syntax I am using:

df = self.spark.read
         .option("sheetName", None)
         .option('header', 'true')
         .csv(file_path)

sheet_names = df.keys()
print(sheet_names)

Error:

'DataFrame' object has no attribute 'keys'

Does this answer your question? [Reading Excel (.xlsx) file in pyspark](https://stackoverflow.com/questions/59854917/reading-excel-xlsx-file-in-pyspark) — notNull, Apr 04 '23 at 13:08
Possibly relevant: https://stackoverflow.com/questions/29615196/is-csv-with-multi-tabs-sheet-possible — Sarah Messer, Apr 04 '23 at 13:08
@notNull I don't know the sheet names. If I can hardcode then no prob — Adrita Sharma, Apr 04 '23 at 13:10
@SarahMesser I need to use apache spark. The answer is in c#. I can solve it in any other languages, c#, python etc. I need to use `pyspark` — Adrita Sharma, Apr 04 '23 at 13:11
@AdritaSharma The question & answers I linked are only nominally about C#. (There's no code in either the Q or A.) My point with the link is that "multisheet CSV" seems to be a mislabelling of multisheet Excel files, and that you may have better luck with the processing if you explicitly convert a multisheet XLSX to multiple CSVs before trying to ingest the data into PySpark. Alternately look into PySpark direct handling of XLSX, if that's indeed your original format. — Sarah Messer, Apr 04 '23 at 13:18
@SarahMesser I already have csv, I an do the same using pandas dataframe, trying to do the same using pyspark — Adrita Sharma, Apr 04 '23 at 13:22
@AdritaSharma A CSV has no sheet. It's just a plain text file where the delimtier between columns is supposed to be a comma. — Itération 122442, Apr 04 '23 at 14:14

score 1 · Answer 1 · answered Apr 04 '23 at 14:19

You are reading a CSV file, which is a plain text file, so first of all, trying to get excel sheet names from it does not make sense.

Second, reading the CSV file returns you are spark dataframe. This dataframe, as you can see in this documentation, has no method named "keys".

Pyspark: Read csv file with multiple sheets

1 Answers1