1
import pandas as pd
import numpy as np    
column = np.array([5505, 5505, 5505, 34565, 34565, 65539, 65539])
column = pd.Series(column)
myDict = column.groupby(by = column ).groups

I am creating a dictionary from a pandas df using df.group(by=..) which has the form:

>>> myDict
{5505: Int64Index([0, 1, 2], dtype='int64'), 65539: Int64Index([5, 6], dtype='int64'), 34565: Int64Index([3, 4], dtype='int64')}

I have a numpy array, e.g.

myArray = np.array([34565, 34565, 5505,65539])

and I want to replace each of the array's elements with the dictionary's values. I have tried several solutions that I have found (e.g. here and here) but these examples have dictionaries with single dictionary values, and I am always getting the error of setting an array element with a sequence. How can I get over this problem?

My intended output is

np.array([3, 4, 3, 4, 0, 1, 2, 5, 6])
Tony
  • 781
  • 6
  • 22
  • Could you add a minimal complete/reproducible sample dictionary? – Divakar May 28 '17 at 20:39
  • Thanks, I will try. I am not sure how to create a small initial pandas column from which to generate the dictionary (because groupby only works on a pandas object, not an array) – Tony May 28 '17 at 20:46
  • @Divakar I have reworked this into a reproducible example now.. – Tony May 28 '17 at 20:54

1 Answers1

1

One approach based on np.searchsorted -

# Extract dict info
k = list(myDict.keys())
v = list(myDict.values())

# Use argsort of k to find search sorted indices from myArray in keys
# Index into the values of dict based on those indices for output
sidx = np.argsort(k)
idx = sidx[np.searchsorted(k,myArray,sorter=sidx)]
out_arr = np.concatenate([v[i] for i in idx])

Sample input, output -

In [369]: myDict
Out[369]: 
{5505: Int64Index([0, 1, 2], dtype='int64'),
 34565: Int64Index([3, 4], dtype='int64'),
 65539: Int64Index([5, 6], dtype='int64')}

In [370]: myArray
Out[370]: array([34565, 34565,  5505, 65539])

In [371]: out_arr
Out[371]: array([3, 4, 3, 4, 0, 1, 2, 5, 6])
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • Thanks! I think it needs to be `k = list(myDict.keys())` or otherwise I am getting the error `'dict_keys' object has no attribute 'searchsorted'`. Same for `v` of course .. – Tony May 28 '17 at 21:12
  • 1
    @Tony I am on Python 2.7. That should be a Python 3.x requirement. Edited. Thanks! – Divakar May 28 '17 at 21:13
  • So, @Divakar, the only way to achieve this is with the list comprehension right? There's no way to avoid the loop there from what I gather ? – Tony May 28 '17 at 22:32
  • On a separate note, do you perhaps have a good resource to suggest about numpy, vectorised operations and all these indexing, slicing and sorting possibilities that you are expert on? I really cannot tune my mind away from loops. – Tony May 28 '17 at 23:47
  • @Tony Nope, can't vectorize that loop. On the vectorization resources, well all I have learnt is through practicing on SO, following the [`ufuncs`](https://docs.scipy.org/doc/numpy/reference/ufuncs.html) and generally being persuasive about avoiding loops. – Divakar May 29 '17 at 02:48