I'm using jupyter on Ubuntu.
So i'm having the next problem, this is my code:
from pyspark import SparkContext
sc = SparkContext.getOrCreate()
ut = sc.textFile("hdfs://localhost:54310/hduser/firstnames")
rows= ut.map(lambda line: line.split(";"))
res = rows.filter(lamda row: row[2] >= "2000" and row[2] <= "2004")
res = res.map(lambda row: ({row[1],row[2]},int(row[3])))
output:
[({'2001', 'Brussel'}, 9),
({'2001', 'Brussel'}, 104),
({'2001', 'Vlaanderen'}, 16),
({'2002', 'Brussel'}, 12), ...]
I need my output to be like:
[({'2001', 'Brussel'}, 113),
({'2001', 'Vlaanderen'}, 16),
({'2002', 'Brussel'}, 12)]
I've tried a couple of things with reduceByKey before and have seen a lot of questions about reduceByKey, but couldn't figure it out. Thanks in advance.