1

I'm trying to make a function call through the map function available in spark. I did what was given in the spark tutorial page (https://spark.apache.org/docs/1.2.0/programming-guide.html). But the function myFunc never gets called. At least that's what I think. I don't know if I'm doing something wrong or missing out something. This is the following code:

from pyspark import SparkContext
if __name__ == "__main__":
    def myFunc(s):
        print("@@")
        words = s.split("\n")
        print("##")
        return len(words)


    sc = SparkContext("local","test")
    sc.textFile("C:\\TestLogs\\sample.log").map(myFunc)
    print("**")

Output:

**

In fact, this is the same example from the spark doc except for the file location.

kavya
  • 759
  • 4
  • 14
  • 31
  • I don't know Python, but you defined your function with a parameter (s). When you call it, you are not passing any argument. Are you sure that's ok? – facundop Oct 26 '16 at 11:15
  • @kaks I don't know Spark, but map over something _empty_ won't call a function even once. Can you confirm your file actually provide any data? – Łukasz Rogalski Oct 26 '16 at 11:17
  • @facundop : Yes it has a parameter. But in the spark docs, in map, `myFunc` doesn't take a parameter even though function is defined with `s`. @ŁukaszRogalski : Yes the sample.log file has 10 log lines. – kavya Oct 26 '16 at 12:18

2 Answers2

1

IT seems like you have not called action just transformation map(myFunc).

All transformations in Spark are lazy, in that they do not compute their results right away. Instead, they just remember the transformations applied to some base dataset (e.g. a file). The transformations are only computed when an action requires a result to be returned to the driver program.

Try to use map(myFunc).saveAsTextFile("folder/here.txt") or some other action you would like to use.

VladoDemcak
  • 4,893
  • 4
  • 35
  • 42
0

It seems that your code lacks spark "action" (e.g. "collect") which is required in order to execute the transformations (e.g. "map").

Try the following:

from pyspark import SparkContext
if __name__ == "__main__":
    def myFunc(s):
        print("@@")
        words = s.split("\n")
        print("##")
        return len(words)


    sc = SparkContext("local","test")
    myrdd = sc.textFile("C:\\TestLogs\\sample.log")
    result =  myrdd.map(myFunc).collect()
    print "the result is"
    print result
    print("**")

consider updating the following to hold "file:\\" (https://stackoverflow.com/a/27301040/5088142)

    myrdd = sc.textFile("file:\\C:\\TestLogs\\sample.log")
Community
  • 1
  • 1
Yaron
  • 10,166
  • 9
  • 45
  • 65