Questions tagged [pyathena]

Resources:

50 questions
15
votes
2 answers

StartQueryExecution operation: Unable to verify/create output bucket

I am trying to execute query on Athena using python. Sample code client = boto3.client( 'athena', region_name=region, aws_access_key_id=AWS_ACCESS_KEY_ID, aws_secret_access_key=AWS_SECRET_ACCESS_KEY ) …
NHD
  • 435
  • 1
  • 6
  • 17
7
votes
1 answer

Pyathena is super slow compared to querying from Athena

I run a query from AWS Athena console and takes 10s. The same query run from Sagemaker using PyAthena takes 155s. Is PyAthena slowing it down or is the data transfer from Athena to sagemaker so time consuming? What could I do to speed this up?
5
votes
1 answer

running aws athena query via pyathena

This query works fine in Athena's front-end: SELECT * FROM analysisdata."iris" limit 10; I am using this Python code to run the above query via Python/pyathena from pyathena import connect cursor = connect(aws_access_key_id='AKI.DELETED.2Q', …
cs0815
  • 16,751
  • 45
  • 136
  • 299
4
votes
0 answers

Change the file format used by to_sql method

This works as expected and creates a new table. But the data is stored in a format that only spark can read. How do I store the data in csv format? from pyathena.pandas.util import to_sql to_sql( mrdf, "mrdf_table3", conn, "s3://" +…
shantanuo
  • 31,689
  • 78
  • 245
  • 403
4
votes
1 answer

AWS Athena PyAthena AccessDeniedException

I am new to AWS. I have a user account and two roles, one for prod one for test. Usually I log into my account and switch to prod role to run some simple select queries. Now I want to use Athena locally in Python with PyAthena. I have tried the…
amaliar
  • 43
  • 4
3
votes
0 answers

using pyathena and sqlalchemy to connect to a database with work_group

I'm new to using pyathena and also SQLalchemy (or DBAPI in general). We are using pyathena to and SQLalchemy to query the data in our S3 bucket but we need to connect with our work_group rather than aws_access_id or secret key. The pyathena page in…
3
votes
1 answer

In R, Error for No Boto3 to connect Athena even though Boto3 Installed

I am trying to connect to Athena from R. After setup 'RAthena' and connection, I got this error: Error: Boto3 is not detected please install boto3 using either: `pip install boto3` in terminal or `install_boto()`. Alternatively…
Randy
  • 63
  • 8
3
votes
2 answers

RuntimeError: Unable to start JVM because of Deprecated: convertStrings

I run an automated python job on an EMR cluster that updates Amazon Athena Tables. It was running well until few days ago (on python 2.7 and 3.7). Here is the script: from pyathenajdbc import connect import yaml config =…
Inna
  • 149
  • 1
  • 3
  • 14
3
votes
1 answer

Why pyathena doesn't work on longer running queries while Athena runs them?

I have a query which runs on Athena (directly) in 43 second by scanning 90GB data. I then use pyathena to run the same query (I use it in jupyter notebook on EMR) and it just doesn't finish running (and never returns any results). I have tested it…
Reyhaneh
  • 409
  • 1
  • 7
  • 21
2
votes
1 answer

Pyathena to_sql creates empty tables

I am trying to write a df into Athena, but the created table is always empty. I use python 3.8 and windows 11 system. I use pyathena writing dataframes to Athena but problems have never occurred till now. from pyathena import connect from…
2
votes
1 answer

Athena preserve order

Is there a way to preserve the order on a query from Athena? Assume the data in the s3 bucket or data lake are partitioned and are in parquet files. Every time I query something, the order is different each time. I am not sure how Athena works,…
user1179317
  • 2,693
  • 3
  • 34
  • 62
2
votes
1 answer

Which one is faster for querying Athena: pyathena or boto3?

Which one is faster pyathena or boto3 to query AWS Athena schemas using python script? Currently I am using pyathena to query Athena schemas but it's quite slow and I know there is another option of boto3 but before starting need some experts…
L Lawliet
  • 419
  • 1
  • 7
  • 20
2
votes
1 answer

AWS athena query result file fetching from s3 bucket

Currently I am working on AWS Athena. We have a webpage which will be displaying the query results. The data stored in the s3 bucket is ingested as part of the data lake, AWS Glue. From our webpage multiple requests/query will be thrown to the AWS…
2
votes
1 answer

Pyathena Schema does not exist

I need to process some data of a certain flow that I have in a specific folder in a bucket S3. I want to do this in Python. After searching for a while I found the library PyAthena which exactly what I was looking for! I installed the version 1.8.0…
Benz
  • 289
  • 4
  • 15
2
votes
0 answers

Pyathena cursor returns 'No result set'

I'm trying to create Athena table and then make some SELECT statement. I've moved the connection to the lambda function: cursor = lambda: connect(s3_staging_dir=STG_DIR).cursor() and then I'm doing some DDL, that creates external Athena table and…
mi.mo
  • 81
  • 2
  • 6
1
2 3 4