What is the issue with this code in pyspark
raw_data = ["James,Smith,36636,M,3000",
"Michael,Rose,40288,M,4000",
"Robert,Williams,42114,M,4000",
"Maria,Anne,39192,F,4000",
"Jen,Mary,899,F,-1"
]
The below code throws errors : unresolved reference m
dataRDD = spark.sparkContext.parallelize(raw_data)
mappedRDD = dataRDD.map(lambda m: \
arr=m.split(",") \
(arr[0],arr[1]))
print(mappedRDD.collect())
I rewritten the same logic in the below style and it works
dataRDD = spark.sparkContext.parallelize(raw_data)
mappedRDD = dataRDD.map(lambda m: (m.split(",")[0],m.split(",")[1]))