Questions tagged [aws-emr-studio]
11 questions
4
votes
1 answer
unable to read s3 files from within aws emr studio notebooks or consoles
We have an EMR Studio that has an S3 default bucket set, i.e. s3://OurBucketName/Subdirectory/work, and within which we've created a Workspace that is attached to an EC2 cluster running emr-6.10.0 with the following apps installed:
Hadoop…

dragonscience
- 41
- 3
2
votes
1 answer
Orchestration of jobs using AWS Step functions using EMR Serverless
Recently Amazon launched EMR Serverless and I want to repurpose my exiting data pipeline orchestration that uses AWS Step Functions: There are steps that create EMR cluster, run some lambda functions, submit Spark Jobs (mostly Scala jobs using…

smishra
- 3,122
- 29
- 31
1
vote
1 answer
How to automate jupyter notebook execution on aws?
I got a task to complete where I need to automate Jupyter notebook execution on AWS. I'm totally new to AWS environment so don't have any idea how to do it efficiently. Things I need to do are the following -
Need REST API(s) to start and stop…

user22
- 112
- 1
- 9
1
vote
1 answer
When I save a PySpark DataFrame with saveAsTable in AWS EMR Studio, where does it get saved?
I can save a dataframe using df.write.saveAsTable('tableName') and read the subsequent table with spark.table('tableName') but I'm not sure where the table is actually getting saved?

Tom
- 11
- 1
1
vote
1 answer
How to create a notebook in EMR Studio using boto3?
I am going through the boto3 documentation here: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/emr.html#EMR.Client.create_studio
but I cannot see any sort of create/delete notebook for EMR studio. Only create/delete…

Randomize
- 8,651
- 18
- 78
- 133
0
votes
0 answers
Simple UDF apply function from the doc is failing with Spark 3.3
This simple code from the latest doc does not work on the EMR Studio Spark cluster (current version: 3.3.1-amzn-0)
df = spark.createDataFrame(
[(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)],
("id", "v"))
def subtract_mean(pdf:…

mountrix
- 1,126
- 15
- 32
0
votes
0 answers
Databricks format in Pyspark to write in Redshift
I am migrating data from postgres to redshift by using jdbc format but for the redshift if I ise jdbc format then some of the options are not available like escape.
So I thought to use format com.databricks.spark.redshift to write by using pyspark.…

vish anand
- 111
- 1
- 4
0
votes
0 answers
Referencing other notebooks in AWS EMR
I am new to AWS EMR, and trying to configure to run it for a code which was developed on my local.
I am basically referencing notebooks within a Masternotebook, this set-up works on my local but not on AWS EMR.
I am trying to execute this line
…

Shanawaz Khan
- 11
- 2
0
votes
1 answer
How to read postgres DB tables through EMR jupyter lab notebook from amazon workspace
I'm trying to read the table from postgres tables. but i'm facing below error.
Note: i cannot be able to refer external files from local since it is a private workspace.
JDBC :…

Sabarish Mahalingam
- 15
- 2
0
votes
0 answers
AWS EMR 6.9 with spark 3.0 and JupyterEnterpriseGateway fails with bootstrapping errors
Struggling to bring up EMR cluster with spark 3.x. Using custom / advanced options since I also need JupyterEnterpriseGateway, however bootstrapping fails with unknown errors.
Using one of the options available in the preselected packages works but…

Mayukh
- 117
- 1
- 4
0
votes
0 answers
Installing Packages onto EMR
I have been scouring the internet for documentations and solutions on the internet to solve this issue that I have been encountering on EMR but so no luck! I have been trying to download some packages onto my EMR workspaces, but it throws out the…

thundercat
- 45
- 6