I am playing around with PySpark with the following code:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("Scoring System").getOrCreate()
df = spark.read.csv('output.csv')
df.show()
after I ran python trial.py on the command line it has been around 5 to 10 minutes, with no progression:
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
2019-05-05 22:58:31 WARN Utils:66 - Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
2019-05-05 22:58:32 WARN Client:66 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
[Stage 0:> (0 + 0) / 1]2019-05-05 23:00:08 WARN YarnScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-05-05 23:00:23 WARN YarnScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-05-05 23:00:38 WARN YarnScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-05-05 23:00:53 WARN YarnScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
[Stage 0:> (0 + 0) / 1]2019-05-05 23:01:08 WARN YarnScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-05-05 23:01:23 WARN YarnScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-05-05 23:01:38 WARN YarnScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
I am hunching that I am lacking resources in my worker node(?), or am I missing something?