I have saved the below table using pyspark to AWS S3, partitioned by column "channel_name". using below code.
df.write.option("header",True) \
.partitionBy("channel_name") \
.mode('append')\
.parquet("s3://path")
start_timestamp | channel_name | value |
---|---|---|
2020-11-02 08:51:50 | velocity | 1 |
2020-11-02 09:14:29 | Temp | 0 |
2020-11-02 09:18:32 | velocity | 0 |
2020-11-02 09:32:42 | velocity | 4 |
2020-11-03 13:06:03 | Temp | 2 |
2020-11-03 13:10:01 | Temp | 1 |
2020-11-03 13:54:38 | Temp | 5 |
2020-11-03 14:46:25 | velocity | 5 |
2020-11-03 14:57:31 | Kilometer | 6 |
2020-11-03 15:07:07 | Kilometer | 7 |
But i want to read same data which is partitoned on column "channel_name" using python, its not working, it is excluding that partitioned column "channel_name". below is code i tried with AWSwrangler.
import awswrangler as wr
df = wr.s3.read_parquet(path="s3://shreyasbigdata/Prod_test_item_id=V214944/")
It looks like this, but i want that "channel_name" column also.
start_timestamp | value |
---|---|
2020-11-02 08:51:50 | 1 |
2020-11-02 09:14:29 | 0 |
2020-11-02 09:18:32 | 0 |
2020-11-02 09:32:42 | 4 |
2020-11-03 13:06:03 | 2 |
2020-11-03 13:10:01 | 1 |
2020-11-03 13:54:38 | 5 |
2020-11-03 14:46:25 | 5 |
2020-11-03 14:57:31 | 6 |
2020-11-03 15:07:07 | 7 |
I tried with different libraries but its not working. Would be great if you help me to read all the columns including partitioned one.