-2

I am new to spark coding with Python (pyspark). I have a txt file in which messages needs to be split at }{ . That is message starts with {...}{...}... like this. I want to split these into

{...}
{...}
{...}

few also has inner message like

{...
    {...}
...
}

CODE:
I am trying with below code and I am getting error saying

Traceback (most recent call last):
  File "PythonBLEDataParser_work2.py", line 49, in <module>
    for line in words:
TypeError: 'PipelinedRDD' object is not iterable

(note - i have removed couple of commented lines and hence line number 49 refers to words = contentRDD.map(lambda x: x.split('}{')))

        from pyspark.sql import SparkSession
        #importing re for removing space and others
        import re

        if __name__ == "__main__":

        spark = SparkSession\
                .builder\
                .appName("PythonBLEDataParser")\
                .getOrCreate()

        contentRDD = spark.sparkContext.textFile("BLE_data_Sample.txt")\

        #nonempty_lines = contentRDD.filter(lambda x: len(x) > 0)
        #print (nonempty_lines.collect())
        words = contentRDD.map(lambda x: x.split('}{'))
        for line in words:
            print (line)

        spark.stop

I tried .map( lambda elem: list(elem)) mentioned in pyspark: 'PipelinedRDD' object is not iterable but didn't help.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
vasista k.j
  • 51
  • 1
  • 1
  • 4

1 Answers1

1

After:

words = contentRDD.map(lambda x: x.split('}{'))

You only did a transformation on contentRDD not an action. This is still a PipelinedRDD object.

If you want to collect it into the driver you need an action like "collect", so:

words = contentRDD.map(lambda x: x.split('}{')).collect()
for line in words:
    print (line)
user3689574
  • 1,596
  • 1
  • 11
  • 20