1

I am just starting to learn Spark so please bear with me if this is too obvious.
I installed spark and I am able to run it in a terminal (by "./bin/pyspark").
But I fail to try the following example (word count):

path = os.path.join("sample-text.txt")
with open(path, "w") as testFile:
    _ = testFile.write("Hello world Hello")

file = sc.textFile(path)
counts = file.flatMap(lambda line: line.split(" ")) \
             .map(lambda word: (word, 1)) \
             .reduceByKey(lambda a, b: a + b)  

path2 = os.path.join("word-count.txt")
counts.saveAsTextFile(path2)  

Everything went through, but when I was trying to open the output word-count.txt file, it says that this document cannot be opened.
What am I doing wrong?

Phantômaxx
  • 37,901
  • 21
  • 84
  • 115
Yu Zhang
  • 433
  • 1
  • 5
  • 13

1 Answers1

1

I was trying to open the output word-count.txt file

It creates a directory named word-count.txt, not a file.

$ ls word-count.txt
_SUCCESS   part-00000 part-00001 part-00002
$ cat word-count.txt/part-00000
(u'world', 1)
$ cat word-count.txt/part-00001
(u'Hello', 1)
(u'hello', 1)

Your code works. You have other permission problems with your OS that are preventing you from creating / opening the directory.

Related (scala, but same idea) - how to make saveAsTextFile NOT split output into multiple file?

Community
  • 1
  • 1
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245