Questions tagged [aws-glue3.0]

22 questions
3
votes
1 answer

AWS Glue - Writing File Takes A Very Long Time

Hi, I have an ETL job in AWS Glue that takes a very long time to write. It reads data from S3 and performs a few transformations (all are not listed below, but the transformations do not seem to be the issue) and then finally writes the data frame…
Qwaz
  • 199
  • 9
2
votes
0 answers

Glue secret manager integration: secretId is not provided

I am running the glue pyspark script from my local machine using the GlueETL library. When creating a dataframe from glue catalog, dyf_user_book_reading_stat = glueContext.create_dynamic_frame.from_catalog( database="xxx-db", …
1
vote
3 answers

Pyspark - Glue 3.0 issue, upgrading of Spark 3.0 : reading dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z

After upgrading to Glue 3.0 I got the following error when handling rdd objects An error occurred while calling o926.javaToPython. You may get a different result due to the upgrading of Spark 3.0: reading dates before 1582-10-15 or timestamps…
Smaillns
  • 2,540
  • 1
  • 28
  • 40
1
vote
1 answer

Cast Issue with AWS Glue 3.0 - Pyspark

I'm using Glue 3.0 data = [("Java", "6241499.16943521594684385382059800664452")] rdd = spark.sparkContext.parallelize(data) df = rdd.toDF() df.show() df.select(f.col("_2").cast("decimal(15,2)")).show() I get the following…
Smaillns
  • 2,540
  • 1
  • 28
  • 40
1
vote
2 answers

AWS glue NoClassDefFoundError on job.init()

Trying to debug AWS Glue scripts locally using Glue ETL library. I have installed aws-glue-libs and spark-3.1.1-amzn-0-bin-3.2.1-amzn-3.tgz. When I run job.init(), I get the following error trace: py4j.protocol.Py4JJavaError: An error occurred while…
sheetal_158
  • 7,391
  • 6
  • 27
  • 44
0
votes
0 answers

AWS Glue Job Monthly Report

I have an AWS Glue job that targets to generate a monthly report I am using aws s3 parquet format as my source and validating query at athena current issue is when my aws glue job runs the following day it only aggregated 1 day only adv_amt…
c0ng111
  • 31
  • 3
0
votes
0 answers

AWS Glue Studio Notebook using Terraform

Does anyone know how to create a AWS Glue studio notebook using Terraform. I have tried to find out the exact resource name and details for creating that service but unable to create it using Terraform. If any one having idea on this Please help. I…
0
votes
0 answers

How do I change how AWS Glue Jobs format partitions in S3?

I'm running Glue Jobs for a bunch of related tables with a ts (timestamp) partition. By default, each Glue job writes the output files in S3 using this folder structure (for a given table and…
Alvaro Mendez
  • 134
  • 2
  • 13
0
votes
1 answer

how to build and test a AWS glue ETL spark code in local VS Code?

I am new to AWS Glue and I have been assigned to create a AWS Glue ETL job . We have only AWS Prod Environment in our project. I want to know how to setup my VS Code IDE so that I can build and test my glue code ? I have seen a solution with docker…
0
votes
1 answer

AWS --extra-py-files throwing ModuleNotFoundError: No module named 'pg8000'

I am trying to use pg8000 in my Glue Script, following are params in Glue Job --extra-py-files s3://mybucket/pg8000libs.zip //NOTE: my zip contains __init__.py Some Insights towards code import sys import os from awsglue.transforms import…
noobie-php
  • 6,817
  • 15
  • 54
  • 101
0
votes
0 answers

An error occurred while calling .pyWriteDynamicFrame. YEAR

I am encountering problems in my AWS Glue as it is stated in the title (also attached the error) error. Different jobs, return different error number code such as: An error occurred while calling o176.pyWriteDynamicFrame. YEAR and another returns An…
aya
  • 1
  • 1
0
votes
0 answers

Does aws xray support glue tracing?

I am trying to trace a python glue job. The glue job is called from a step function. Step Function is natively integrated with xray, and I am able to see the trace of the step function. The trace contains the call to Glue, but the segment is not…
Jeremy Fisher
  • 2,510
  • 7
  • 30
  • 59
0
votes
1 answer

How to pass s3 object names getting from the lambda events as a parameters to the AWS Glue Workflow

I have an S3 bucket that will trigger invoking a lambda function based on the put event type. Then the lambda function will trigger the Glue workflow. In the Glue workflow, I have created one glue job that converts XLSX files to CSV. This is the…
Srinivas
  • 51
  • 4
0
votes
0 answers

How to prevent AWS Glue crawler creating duplicate schema on my table

I have a workflow that creates incremental parquet files daily as events get generated in our system. Every time it runs it adds a new parquet file in an s3 partition for the day like so S3 |->date=1 | |-> XXX.parquet // {name: "alice",…
Ryan W Kan
  • 380
  • 3
  • 16
0
votes
1 answer

Argument "--python-modules-installer-option" not working in pythonshell Glue Jobs

I am trying to have a setup similar to that of this article: https://aws.amazon.com/blogs/big-data/simplify-and-optimize-python-package-management-for-aws-glue-pyspark-jobs-with-aws-codeartifact/ I would like to install some packages using a custom…
1
2