Questions tagged [aws-data-wrangler]

AWS Data Wrangler offers abstracted functions to execute usual ETL tasks like load/unload data from Data Lakes, Data Warehouses and Databases. It integrates with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

Project: awswrangler · PyPI

69 questions
6
votes
2 answers

Difference between awswrangler and boto3?

I have use boto3 to connect with aws services through python code. Recently I came across awswrangler library which has similar functionality as boto3. What is the difference between both.Can you explain with example like in which scenario we should…
6
votes
1 answer

aws read data from athena error using aws wrangler

I am using python3 I am trying to read data from aws athena using awswrangler package. Below is the code import boto3 import awswrangler as wr import pandas as pd df_dynamic=wr.athena.read_sql_query("select * from test",database="tst") Error: …
user3292373
  • 483
  • 3
  • 8
  • 25
4
votes
3 answers

How can I bulk upload JSON records to AWS OpenSearch index using a python client library?

I have a sufficiently large dataset that I would like to bulk index the JSON objects in AWS OpenSearch. I cannot see how to achieve this using any of: boto3, awswrangler, opensearch-py, elasticsearch, elasticsearch-py. Is there a way to do this…
3
votes
1 answer

querying athena using awsdatawrangler

I am trying to query my athena database using import awswrangler as wr df = wr.athena.read_sql_query(sql="""SELECT * FROM tablename limit 10;""" , database="databasename" …
Devarshi Goswami
  • 1,035
  • 4
  • 11
  • 26
2
votes
1 answer

Amazon Sagemaker Studio Data Wrangler athena query failing for large datasets

Trying to query a large dataset from Athena using AWS data wrangler. The query fails for large datasets. This is for setting up a datawrangler pipeline using UI in AWS studio trying to add a Athena Source. Some observations: Small Athena queries…
2
votes
1 answer

awswrangler redshift to_sql upserting specific columns

Suggest we have a table with a row like below. I want to update just col_id and slug columns with a new values. this is the code line I use to update this row. df = pd.DataFrame([[datas.get("collection_id"), datas.get("slug")]], columns=["col_id",…
2
votes
1 answer

How do I use awswrangler to read only the first few N rows of a parquet file stored in S3?

I am trying to use awswrangler to read into a pandas dataframe an arbitrarily-large parquet file stored in S3, but limiting my query to the first N rows due to the file's size (and my poor bandwidth). I cannot see how to do it, or whether it is even…
jtlz2
  • 7,700
  • 9
  • 64
  • 114
2
votes
1 answer

Is there any way to capture the input file name of multiple parquet files read in with a wildcard in pandas/awswrangler?

This is the exact python analogue of the following Spark question: Is there any way to capture the input file name of multiple parquet files read in with a wildcard in Spark? I am reading in a wildcard list of parquet files using (variously) pandas…
jtlz2
  • 7,700
  • 9
  • 64
  • 114
2
votes
2 answers

How to read all parquet files from S3 using awswrangler in python

Need read all parquet files with ext .parquet s3_path = "s3://buckte/table/files.parquet" df = wr.s3.read_parquet( path=[s3_path] ) , but still a error : Error occurred (404) when calling the HeadObject
2
votes
1 answer

Adding tags to S3 objects using awswrangler?

I'm using awswrangler to write parquets in my S3 and I usually add tags on all my objects to access and cost control, but I didn't find a way to do that using directly awswrangler. I'm current using the code below to test: import awswrangler as…
2
votes
1 answer

How to catch exceptions.NoFilesFound error from awswrangler in Python 3

Here is my code to read the parquet files stored in an S3 bucket path. When it finds the parquet files in the path, it works, but gives exceptions.NoFilesFound when it cannot find any file. import boto3 import awswrangler as wr …
Rafiq
  • 1,380
  • 4
  • 16
  • 31
2
votes
1 answer

awswrangler.s3.read_parquet ignores partition_filter argument

The partition_filter argument in wr.s3.read_parquet() is failing to filter a partitioned parquet dataset on S3. Here's a reproducible example (might require a correctly configured boto3_session argument): Dataset setup: import pandas as pd import…
geotheory
  • 22,624
  • 29
  • 119
  • 196
2
votes
3 answers

How to get python package `awswranger` to accept a custom `endpoint_url`

I'm attempting to use the python package awswrangler to access a non-AWS S3 service. The AWS Data Wranger docs state that you need to create a boto3.Session() object. The problem is that the boto3.client() supports setting the endpoint_url, but…
David Parks
  • 30,789
  • 47
  • 185
  • 328
2
votes
1 answer

AWS Lambda Chalice Layers Segmentation Fault

I am deploying a Python 3.7 Lambda function via Chalice. Because the code with its environment requirements, is larger than 50 MB limit, I am using the "automatic_layer" feature of Chalice to generate the layer with the requirements, which is…
1
vote
1 answer

Is there a way to specify the Parquet Version using AWS data Wrangler

We are writing parquet Files which seem to default to version 1. enter image description here which teradata NOS complains with a "Native Object Store user error: Unsupported file version" How can we specify with AWS data wrangler /SDK for Pandas…
1
2 3 4 5