Questions tagged [emr-serverless]

Amazon EMR Serverless is a new option in Amazon EMR that makes it easy and cost-effective for data engineers and analysts to run petabyte-scale data analytics in the cloud.

With EMR Serverless, you can run applications built using open-source frameworks such as Apache Spark, Hive, and Presto without having to configure, manage, optimize, or secure clusters. EMR Serverless automatically provisions and scales the compute and memory resources required by your applications, and you only pay for the resources that the applications use.

32 questions
11
votes
2 answers

AWS Glue vs EMR Serverless

Recently, AWS announced Amazon EMR Serverless (Preview) https://aws.amazon.com/blogs/big-data/announcing-amazon-emr-serverless-preview-run-big-data-applications-without-managing-servers/ - new very promising service. From my understanding - AWS…
alexanoid
  • 24,051
  • 54
  • 210
  • 410
3
votes
0 answers

AWS EMR Serverless spark properties delimter

I'm trying to run a spark job using EMR Serverless but the issue is I cannot pass the list of jars and archives to the spark job. The spark properties section does not seem to allow passing in a comma delimited list. AWS documentation page clearly…
Philip K. Adetiloye
  • 3,102
  • 4
  • 37
  • 63
2
votes
1 answer

How to run a Python project (package) on AWS EMR serverless?

I have a Python project with several modules, classes, and dependencies files (a requirements.txt file). I want to pack it into one file with all the dependencies and give the file path to AWS EMR serverless, which will run it. The problem is that I…
nirkov
  • 697
  • 10
  • 25
1
vote
2 answers

botocore.exceptions.NoRegionError: You must specify a region for EmrServerlessCreateApplicationOperator

I am trying to create a emr-serverless application through the EmrServerlessCreateApplicationOperator but I keep facing the error botocore.exceptions.NoRegionError: You must specify a region. I am passing the region like below: create_app =…
1
vote
1 answer

EMR serverless- Pass jars in console

I'm new with EMR-serverless and I want to know how to pass, in a spark application, jar and packages as for example: spark-submit --deploy-mode client --jars…
1
vote
1 answer

AWS EMR serverless - how to submit pyspark jobs (using console) with multiple files?

Hi i am new to EMR serverless and trying to learn. I have a pyspark project which i want to run using EMR serverless. I tried using console but it is not letting me provide folder location as input. i can submit only one file , and when i try that -…
Corey A
  • 11
  • 1
1
vote
1 answer

How to pass EMR Serverless PySpark entryPointArguments as variable

I have an EMR Serverless PySpark job I am launching from a step function. I am trying to pass arguments to SparkSubmit from the entryPointArguments in the form of variables set in the beginning of the step function i.e. today_date, source,…
1
vote
0 answers

AWS EMR serverless connect to jdbc SQL Server

I have been connecting with SQL Server using EMR Serverless App v-6.8.0 for Spark. So, I have tested code in local machine as well as on ec2 but when I ran the code on this serverless cluster I got an error. Note: My VPC Security Group has enabled…
1
vote
1 answer

Virtualenv in aws emr-serverless

I'm trying to run some jobs on aws cli using a virtual environment where I installed some libraries. I followed this guide; the same is here. But when I run the job I have this error: Job execution failed, please check complete logs in configured…
solopiu
  • 718
  • 1
  • 9
  • 28
1
vote
1 answer

regexp extract pyspark sql: ParseException Literals of type 'R' are currently not supported

I'm using Pyspark SQL with regexp_extract in this way: df = spark.createDataFrame([['id_20_30', 10], ['id_40_50', 30]], ['id', 'age']) df.createOrReplaceTempView("table") sql_statement=""" select regexp_extract(id, r'(\d+)', 1) as id from…
1
vote
1 answer

EMR serverless cannot connect to s3 in another region

I have an EMR serverless app that cannot connect to S3 bucket in another region. Is there a workaround for that? Maybe a parameter to set in Job parameters or Spark parameters when submitting a new job. The error is this: ExitCode: 1. Last few…
solopiu
  • 718
  • 1
  • 9
  • 28
0
votes
0 answers

How to configure EMR Serverless to log spark applications correctly to stdout and stderr

I am currently running Scala Spark applications on EMR serverless and all of the logs are getting output to stderr and logged at info level. Looking at this page it seems like this is the default for…
Darragh.McL
  • 125
  • 1
  • 10
0
votes
1 answer

EmrServerlessCreateApplicationOperator networkConfiguration with multiple subnetIds

If I pass more than one subnet Id to EmrServerlessCreateApplicationOperator via the networkConfiguration attribute, I receive an error. If I use a single subnet Id the operator works fine. This is the network configuration and also shown is an…
singleton
  • 161
  • 4
0
votes
1 answer

Executors not seem to be created or scaling up on Spark Application on AWS EMR Serverless

I would appreciate your help with my problem. I'm running a spark application on AWS EMR serverless with emr 6.11 release. I'm using Spark 3.3.2 with java 17, with configuration: maximum recourses of 200 vCPU and 1600gb memory. My application is…
0
votes
0 answers

Fixing ApplicationID in aws EMR serverless or any aws resource via terraform

Currently EMR Serverless applicationID changes every time there is a configuration change, so our dashboards need to be regularly updated. Is there a way to fix applicationID or any other way in which we can get historical view of the resource that…
1
2 3