I am trying to read in data from a csv file then do a transpose. Spark doesn't seem to have a function for that, so (for now) I am reading that file into a Pandas dataframe, then do a transpose()
then convert/ingest into a Sparks dataframe.
The data in Pandas after transpose()
, and results in pdft looks like this:
0 ... 76
name TotalRevenue ... TaxEffectOfUnusualItems
ttm 94,950,000,000 ... 0
12/31/2022 94,950,000,000 ... 0
12/31/2021 89,113,000,000 ... 0
12/31/2020 85,528,000,000 ... 0
12/31/2019 91,244,000,000 ... 0
12/31/2018 91,247,000,000 ... 0
12/31/2017 87,352,000,000 ... 0
then after df = session.createDataFrame(pdft)
, df.show(5)
, we have this:
+--------------+-------------------+------------------+--------------------+
| 0| 1| 2| 3|
+--------------+-------------------+------------------+--------------------+
| TotalRevenue|\tNetInterestIncome|\t\tInterestIncome|\t\t\tInterestInc...|
|94,950,000,000| 52,462,000,000| 72,565,000,000| 37,919,000,000|
|94,950,000,000| 52,462,000,000| 52,462,000,000| 37,919,000,000|
|89,113,000,000| 42,934,000,000| 47,672,000,000| 29,282,000,000|
|85,528,000,000| 43,360,000,000| 51,585,000,000| 34,029,000,000|
+--------------+-------------------+------------------+--------------------+
only showing top 5 rows
The first column of the transposed data is not present in the Spark dataframe, how can I get that back? Thanks!