I'm very new to spark (and programming), and so if you can help me understand the difference between these 2 outputs that would be great.
map()
>>> data = ['1', '2', '3', '4', '5', 'one', 'two']
>>> distData = sc.parallelize(data)
>>> maping = distData.map(lambda x: x.split())
>>> maping.collect()
[['1'], ['2'], ['3'], ['4'], ['5'], ['one'], ['two']]
>>> for i in maping.take(100): print(i)
...
['1']
['2']
['3']
['4']
['5']
['one']
['two']
FlatMap()
>>> maping = distData.flatMap(lambda x: x.split())
>>> maping.collect()
['1', '2', '3', '4', '5', 'one', 'two']
>>> for i in maping.take(100): print(i)
...
1
2
3
4
5
one
two