I'm trying to write a custom function that takes an RDD, lower cases each record, splits it into characters, and then uses each record as the key in a key value pair where the value is always 1. I've written two other custom functions that do the lower casing and the character splitting, to_lower() and to_characters(), respectively.
I've tried a few different things, but so far I've only been able to get the entire list as the key instead of each record being in its own pair.
#Attempt 1
def rdd_to_character_value_pairs(rdd):
lowerRDD = rdd.map(lambda x: to_lower(x))
characterRDD = lowerRDD.map(lambda x: to_characters(x))
pairedRDD = characterRDD.map(lambda x: ([char for char in characterRDD], 1))
return pairedRDD
#Attempt 2
def rdd_to_character_value_pairs(rdd):
lowerRDD = rdd.map(lambda x: to_lower(x))
characterRDD = lowerRDD.map(lambda x: to_characters(x))
for i in characterRDD.collect():
return ([char for char in characterRDD], 1)
#have also tried return (i,1)
I understand that you can't iterate over an RDD, but I haven't been able to get any of the workarounds to work either.