Highest Voted 'aws-data-wrangler' Questions

6

votes

2 answers

Difference between awswrangler and boto3?

I have use boto3 to connect with aws services through python code. Recently I came across awswrangler library which has similar functionality as boto3. What is the difference between both.Can you explain with example like in which scenario we should…

amazon-web-services boto3 aws-data-wrangler

asked May 18 '21 at 04:14

PrajaktaParkhade

95
2
11

6

votes

1 answer

aws read data from athena error using aws wrangler

I am using python3 I am trying to read data from aws athena using awswrangler package. Below is the code import boto3 import awswrangler as wr import pandas as pd df_dynamic=wr.athena.read_sql_query("select * from test",database="tst") Error: …

amazon-web-services amazon-athena aws-data-wrangler

asked Nov 26 '20 at 14:35

user3292373

483
3
8
25

4

votes

3 answers

How can I bulk upload JSON records to AWS OpenSearch index using a python client library?

I have a sufficiently large dataset that I would like to bulk index the JSON objects in AWS OpenSearch. I cannot see how to achieve this using any of: boto3, awswrangler, opensearch-py, elasticsearch, elasticsearch-py. Is there a way to do this…

python opensearch elasticsearch-py aws-data-wrangler amazon-opensearch

asked Jun 08 '22 at 11:32

jtlz2

7,700
9
64
114

3

votes

1 answer

querying athena using awsdatawrangler

I am trying to query my athena database using import awswrangler as wr df = wr.athena.read_sql_query(sql="""SELECT * FROM tablename limit 10;""" , database="databasename" …

python amazon-web-services aws-data-wrangler

asked Aug 15 '22 at 07:45

Devarshi Goswami

1,035
4
11
26

2

votes

1 answer

Amazon Sagemaker Studio Data Wrangler athena query failing for large datasets

Trying to query a large dataset from Athena using AWS data wrangler. The query fails for large datasets. This is for setting up a datawrangler pipeline using UI in AWS studio trying to add a Athena Source. Some observations: Small Athena queries…

amazon-sagemaker aws-data-wrangler amazon-sagemaker-studio

asked Dec 15 '22 at 20:49

user2242666

211
1
2
7

2

votes

1 answer

awswrangler redshift to_sql upserting specific columns

Suggest we have a table with a row like below. I want to update just col_id and slug columns with a new values. this is the code line I use to update this row. df = pd.DataFrame([[datas.get("collection_id"), datas.get("slug")]], columns=["col_id",…

null insert amazon-redshift upsert aws-data-wrangler

asked Sep 09 '22 at 12:10

Beşir Kassab

33
3

2

votes

1 answer

How do I use awswrangler to read only the first few N rows of a parquet file stored in S3?

I am trying to use awswrangler to read into a pandas dataframe an arbitrarily-large parquet file stored in S3, but limiting my query to the first N rows due to the file's size (and my poor bandwidth). I cannot see how to do it, or whether it is even…

pandas dataframe amazon-s3 pyarrow aws-data-wrangler

asked May 25 '22 at 12:15

jtlz2

7,700
9
64
114

2

votes

1 answer

Is there any way to capture the input file name of multiple parquet files read in with a wildcard in pandas/awswrangler?

This is the exact python analogue of the following Spark question: Is there any way to capture the input file name of multiple parquet files read in with a wildcard in Spark? I am reading in a wildcard list of parquet files using (variously) pandas…

python pandas dataframe parquet aws-data-wrangler

asked Mar 30 '22 at 12:18

jtlz2

7,700
9
64
114

2

votes

2 answers

How to read all parquet files from S3 using awswrangler in python

Need read all parquet files with ext .parquet s3_path = "s3://buckte/table/files.parquet" df = wr.s3.read_parquet( path=[s3_path] ) , but still a error : Error occurred (404) when calling the HeadObject

python aws-data-wrangler

asked Sep 29 '21 at 20:02

Cristián Vargas Acevedo

580
11
16

2

votes

1 answer

Adding tags to S3 objects using awswrangler?

I'm using awswrangler to write parquets in my S3 and I usually add tags on all my objects to access and cost control, but I didn't find a way to do that using directly awswrangler. I'm current using the code below to test: import awswrangler as…

pandas amazon-web-services amazon-s3 parquet aws-data-wrangler

asked Sep 07 '21 at 10:08

Ricardo Mutti

2,639
2
19
20

2

votes

1 answer

How to catch exceptions.NoFilesFound error from awswrangler in Python 3

Here is my code to read the parquet files stored in an S3 bucket path. When it finds the parquet files in the path, it works, but gives exceptions.NoFilesFound when it cannot find any file. import boto3 import awswrangler as wr …

python amazon-s3 exception parquet aws-data-wrangler

asked Apr 21 '21 at 19:11

Rafiq

1,380
4
16
31

2

votes

1 answer

awswrangler.s3.read_parquet ignores partition_filter argument

The partition_filter argument in wr.s3.read_parquet() is failing to filter a partitioned parquet dataset on S3. Here's a reproducible example (might require a correctly configured boto3_session argument): Dataset setup: import pandas as pd import…

python amazon-web-services amazon-s3 aws-data-wrangler

asked Apr 06 '21 at 22:26

geotheory

22,624
29
119
196

2

votes

3 answers

How to get python package `awswranger` to accept a custom `endpoint_url`

I'm attempting to use the python package awswrangler to access a non-AWS S3 service. The AWS Data Wranger docs state that you need to create a boto3.Session() object. The problem is that the boto3.client() supports setting the endpoint_url, but…

amazon-web-services amazon-s3 boto3 aws-data-wrangler

asked Mar 25 '21 at 00:38

David Parks

30,789
47
185
328

2

votes

1 answer

AWS Lambda Chalice Layers Segmentation Fault

I am deploying a Python 3.7 Lambda function via Chalice. Because the code with its environment requirements, is larger than 50 MB limit, I am using the "automatic_layer" feature of Chalice to generate the layer with the requirements, which is…

aws-lambda segmentation-fault aws-lambda-layers chalice aws-data-wrangler

asked Jan 09 '21 at 13:12

onurmatik

5,105
7
42
67

1

vote

1 answer

Is there a way to specify the Parquet Version using AWS data Wrangler

We are writing parquet Files which seem to default to version 1. enter image description here which teradata NOS complains with a "Native Object Store user error: Unsupported file version" How can we specify with AWS data wrangler /SDK for Pandas…

aws-data-wrangler

asked Dec 14 '22 at 19:02

Marc Williams

11
1

Questions tagged [aws-data-wrangler]