0

I read through AWS documentation for Pandas, and it seem like you should be able tom write a dataframe to a table in Athena. When I run the code below, I get an error that says 'ValueError: need more than 2 values to unpack'.

import awswrangler as wr
import pandas as pd

df = pd.DataFrame({
"id": [1, 2, 3],
"value": ["foo", "boo", "zoo"],
"name": ["teddy", "timmy", "tommy"]
})
print(df)

wr.s3.to_parquet(
    df=df,
    path='s3://client/awsdatacatalog/',
    dataset=True,
    mode="overwrite",
    database="client_rs",
    table='newtable'
)
wr.s3.read_parquet(path, dataset=True)

Any idea what could be wrong here. This seems pretty straightforward, but nevertheless, it's not working.

https://readthedocs.org/projects/aws-data-wrangler/downloads/pdf/latest/

ASH
  • 20,759
  • 19
  • 87
  • 200
  • what is `wr`? it's not declared anywhere unless i'm missing something – Umar.H Sep 16 '22 at 20:52
  • I missed the top two lines when I did the copy/paste before. I just updated my question accordingly. – ASH Sep 16 '22 at 21:07
  • What is in `path`? Also, try this https://stackoverflow.com/a/48809552/643500 – Sully Sep 16 '22 at 21:25
  • The path is the name of the S3 environment that I'm working in. Also, I wasnt to go from dataframe to a table in Athena. I'm not going from Athena to dataframe. – ASH Sep 16 '22 at 21:52

1 Answers1

2

I tried the same exact code as your and didn't find any issues. Also using this script you are not directly writing to Athena instead you will be writing to Glue catalog from which Athena can read table information and retrieve data from S3.For me this looks like a problem with version, can you use below versions which should fix it for you.

python version used :

prabhakar@MacBook-Pro Downloads % python3 --version
Python 3.9.6
prabhakar@MacBook-Pro D

awsdatawrangler version used:

prabhakar@MacBook-Pro Downloads % pip3 freeze | grep awswrangler
awswrangler==2.16.1
prabhakar@MacBook-Pro Downloads % 

And the script I used which is exactly same as yours with different table and database names :

import awswrangler as wr
import pandas as pd

df = pd.DataFrame({
"id": [1, 2, 3],
"value": ["foo", "boo", "zoo"],
"name": ["teddy", "timmy", "tommy"]
})
print(df)

wr.s3.to_parquet(
    df=df,
    path='s3://testing/sf_73750110/',
    dataset=True,
    mode="overwrite",
    database="sampledb",
    table='newtable'
)
df = wr.s3.read_parquet("s3://testing/sf_73750110/", dataset=True)


df = wr.athena.read_sql_query("SELECT * FROM newtable", database="sampledb")
print(df.head())

Below is the output generated where I was able to write parquet table to Glue catalog/s3 and read it using Athena:

enter image description here

Prabhakar Reddy
  • 4,628
  • 18
  • 36
  • Thanks for sharing. If mode="overwrite", does this delete any/all kdata/files in the path? The AWS documentation says overwrite...deletes everything in the target directory and then add new files. That sounds extremely dangerous!!! – ASH Sep 19 '22 at 00:25
  • If you don't want to overwrite and just append then you can leave this property. To know more refer to allowed options for mode here https://aws-sdk-pandas.readthedocs.io/en/stable/stubs/awswrangler.s3.to_parquet.html#awswrangler-s3-to-parquet – Prabhakar Reddy Sep 19 '22 at 02:41