0

I'm running a pyspark job on spark (single node, stand-alone) and trying to save the output in a text file in the local file system.

input = sc.textFile(inputfilepath)
words = input.flatMap(lambda x: x.split())
wordCount = words.countByValue()

wordCount.saveAsTextFile("file:///home/username/output.txt")

I get an error saying

AttributeError: 'collections.defaultdict' object has no attribute 'saveAsTextFile'

Basically whatever I add to 'wordCount' object, for example collect() or map() it returns the same error. The code works with no problem when output goes to the terminal (with a for loop) but I can't figure what is missing to send the output to a file.

piterd
  • 117
  • 1
  • 9

1 Answers1

1

The countByValue() method that you're calling is returning a dictionary of word counts. This is just a standard python dictionary, and doesn't have any Spark methods available to it.

You can use your favorite method to save the dictionary locally.

Community
  • 1
  • 1
Kyle Heuton
  • 9,318
  • 4
  • 40
  • 52
  • Beat me to it. @Snoozer is 100% correct. countByValue doesnt create a new RDD, its a local dictionary. – Joe Widen Feb 19 '16 at 19:34
  • Thanks... I changed it to `map(lambda x: (str(x),1)).reduceByKey(add)` with `from operator import add` – piterd Feb 19 '16 at 20:20