I'm trying to convert dictionary keys (from json.loads()
) to ints with map()
. I know I can do this with loops, but I'm trying to do it functionally so I can implement it in spark. For example:
import pyspark as ps
import json
# Uses all 4 cores on your machine
sc = ps.SparkContext('local[4]')
file_rdd = sc.textFile('data/cookie_data.txt')
kv_rdd_json = file_rdd.map(lambda x: json.loads(x))
kv_rdd2 = kv_rdd_json.map(lambda x: map(int, x.get)) # here's the issue
kv_rdd.collect()
I have another way to do it with a function, but I'm curious: how can I do it with .map in pyspark (and python2, bonus for python3)?
Per the comments: example data (plaintext):
{"Jane": "2"}
{"Jane": "1"}
{"Pete": "20"}
{"Tyler": "3"}
{"Duncan": "4"}
{"Yuki": "5"}
{"Duncan": "6"}
{"Duncan": "4"}
{"Duncan": "5"}
example of how to convert dict values to int: Python: How to convert a list of dictionaries' values into int/float from string?
for key in mydict.keys():
mydict[key] = int(mydict[key])
The .get is kind of like here: Sort a Python dictionary by value