How can we write a dataframe to a table in AWS Athena?

Question

I read through AWS documentation for Pandas, and it seem like you should be able tom write a dataframe to a table in Athena. When I run the code below, I get an error that says 'ValueError: need more than 2 values to unpack'.

import awswrangler as wr
import pandas as pd

df = pd.DataFrame({
"id": [1, 2, 3],
"value": ["foo", "boo", "zoo"],
"name": ["teddy", "timmy", "tommy"]
})
print(df)

wr.s3.to_parquet(
    df=df,
    path='s3://client/awsdatacatalog/',
    dataset=True,
    mode="overwrite",
    database="client_rs",
    table='newtable'
)
wr.s3.read_parquet(path, dataset=True)

Any idea what could be wrong here. This seems pretty straightforward, but nevertheless, it's not working.

https://readthedocs.org/projects/aws-data-wrangler/downloads/pdf/latest/

what is `wr`? it's not declared anywhere unless i'm missing something — Umar.H, Sep 16 '22 at 20:52
I missed the top two lines when I did the copy/paste before. I just updated my question accordingly. — ASH, Sep 16 '22 at 21:07
What is in `path`? Also, try this https://stackoverflow.com/a/48809552/643500 — Sully, Sep 16 '22 at 21:25
The path is the name of the S3 environment that I'm working in. Also, I wasnt to go from dataframe to a table in Athena. I'm not going from Athena to dataframe. — ASH, Sep 16 '22 at 21:52

score 2 · Accepted Answer · answered Sep 17 '22 at 04:53

I tried the same exact code as your and didn't find any issues. Also using this script you are not directly writing to Athena instead you will be writing to Glue catalog from which Athena can read table information and retrieve data from S3.For me this looks like a problem with version, can you use below versions which should fix it for you.

python version used :

prabhakar@MacBook-Pro Downloads % python3 --version
Python 3.9.6
prabhakar@MacBook-Pro D

awsdatawrangler version used:

prabhakar@MacBook-Pro Downloads % pip3 freeze | grep awswrangler
awswrangler==2.16.1
prabhakar@MacBook-Pro Downloads %

And the script I used which is exactly same as yours with different table and database names :

import awswrangler as wr
import pandas as pd

df = pd.DataFrame({
"id": [1, 2, 3],
"value": ["foo", "boo", "zoo"],
"name": ["teddy", "timmy", "tommy"]
})
print(df)

wr.s3.to_parquet(
    df=df,
    path='s3://testing/sf_73750110/',
    dataset=True,
    mode="overwrite",
    database="sampledb",
    table='newtable'
)
df = wr.s3.read_parquet("s3://testing/sf_73750110/", dataset=True)


df = wr.athena.read_sql_query("SELECT * FROM newtable", database="sampledb")
print(df.head())

Below is the output generated where I was able to write parquet table to Glue catalog/s3 and read it using Athena:

Thanks for sharing. If mode="overwrite", does this delete any/all kdata/files in the path? The AWS documentation says overwrite...deletes everything in the target directory and then add new files. That sounds extremely dangerous!!! — ASH, Sep 19 '22 at 00:25
If you don't want to overwrite and just append then you can leave this property. To know more refer to allowed options for mode here https://aws-sdk-pandas.readthedocs.io/en/stable/stubs/awswrangler.s3.to_parquet.html#awswrangler-s3-to-parquet — Prabhakar Reddy, Sep 19 '22 at 02:41

How can we write a dataframe to a table in AWS Athena?

1 Answers1

Linked