I need to read a file line wise and split each line into words and perform operations on words.
How do I do that?
I wrote the below code:
logFile = "/home/hadoop/spark-2.3.1-bin-hadoop2.7/README.md" # Should be
some file on your system
spark = SparkSession.builder.appName("SimpleApp1").getOrCreate()
logData = spark.read.text(logFile).cache()
logData.printSchema()
logDataLines = logData.collect()
#The line variable below seems to be of type row. How I perform similar operations
on row or how do I convert row to a string.
for line in logDataLines:
words = line.select(explode(split(line,"\s+")))
for word in words:
print(word)
print("----------------------------------")