3

I'm trying to convert dictionary keys (from json.loads()) to ints with map(). I know I can do this with loops, but I'm trying to do it functionally so I can implement it in spark. For example:

import pyspark as ps
import json

# Uses all 4 cores on your machine
sc = ps.SparkContext('local[4]')

file_rdd = sc.textFile('data/cookie_data.txt')
kv_rdd_json = file_rdd.map(lambda x: json.loads(x))
kv_rdd2 = kv_rdd_json.map(lambda x: map(int, x.get)) # here's the issue
kv_rdd.collect()

I have another way to do it with a function, but I'm curious: how can I do it with .map in pyspark (and python2, bonus for python3)?

Per the comments: example data (plaintext):

{"Jane": "2"}
{"Jane": "1"}
{"Pete": "20"}
{"Tyler": "3"}
{"Duncan": "4"}
{"Yuki": "5"}
{"Duncan": "6"}
{"Duncan": "4"}
{"Duncan": "5"}

example of how to convert dict values to int: Python: How to convert a list of dictionaries' values into int/float from string?

for key in mydict.keys():
    mydict[key] = int(mydict[key])

The .get is kind of like here: Sort a Python dictionary by value

Community
  • 1
  • 1
wordsforthewise
  • 13,746
  • 5
  • 87
  • 117
  • Please provide the content of your dict – Moinuddin Quadri Oct 19 '16 at 20:01
  • 1
    can you provide a `cookie_data.txt` sample? Or even better, the `ps.SparkContext()` output – felipsmartins Oct 19 '16 at 20:01
  • [map](https://docs.python.org/2/library/functions.html#map) takes at least two arguments, a function and iterables to which to apply the function, and it returns a list. `int` is a function, but what is `x.get`? – CAB Oct 19 '16 at 20:03
  • You can still write your own function as pass that into `map`. That function just need to accept a dictionary parameter – OneCricketeer Oct 19 '16 at 20:03
  • If you only want the keys of the dictionary as integers, then you are looking for `map(int, x.keys())` – OneCricketeer Oct 19 '16 at 20:05
  • You say you know how to do it with a loop. Perhaps if your showed the loop, we could better understand what you want and provide an alternate method. – Waylan Oct 19 '16 at 20:06

1 Answers1

5
dict(zip(mydict, map(int, mydict.values())))

Or with lambda:

dict(map(lambda x: (x[0], int(x[1])), mydict.items()))
Greg Tronel
  • 66
  • 1
  • 4