0

My following code just generated a RDD which contains lists, and then transform the list to items via flatMap. But I think the following kind of stupid, do I have to write the function listToItem, and do I have to write the function printStr. Any optimization on the following code please.

def listToItem(inputList):
    return inputList
def printStr(tm):
    print tm

if __name__ == "__main__":
    sc = SparkContext(appName="Test Spark")
    rdd1 = sc.parallelize([[1,2,3],['a','b','c']])
    res = rdd1.flatMap(listToItem).foreach(printStr)
    sc.stop()
Jack
  • 5,540
  • 13
  • 65
  • 113

1 Answers1

0

For the first one just use identity (credit Is there a builtin identity function in python?):

>>> _ = lambda *args: args
>>> rdd1.flatMap(_)

For the second one import print function:

>>> from __future__ import print_function

and

>>> rdd1.flatMap(_).foreach(print)

but remember that it is useless (https://stackoverflow.com/a/25296061/6022341)

Community
  • 1
  • 1