I am new to spark coding with Python (pyspark).
I have a txt file in which messages needs to be split at }{
. That is message starts with {...}{...}...
like this. I want to split these into
{...}
{...}
{...}
few also has inner message like
{...
{...}
...
}
CODE:
I am trying with below code and I am getting error saying
Traceback (most recent call last):
File "PythonBLEDataParser_work2.py", line 49, in <module>
for line in words:
TypeError: 'PipelinedRDD' object is not iterable
(note - i have removed couple of commented lines and hence line number 49 refers to words = contentRDD.map(lambda x: x.split('}{'))
)
from pyspark.sql import SparkSession
#importing re for removing space and others
import re
if __name__ == "__main__":
spark = SparkSession\
.builder\
.appName("PythonBLEDataParser")\
.getOrCreate()
contentRDD = spark.sparkContext.textFile("BLE_data_Sample.txt")\
#nonempty_lines = contentRDD.filter(lambda x: len(x) > 0)
#print (nonempty_lines.collect())
words = contentRDD.map(lambda x: x.split('}{'))
for line in words:
print (line)
spark.stop
I tried .map( lambda elem: list(elem))
mentioned in pyspark: 'PipelinedRDD' object is not iterable but didn't help.