1

Let's say I have the following df -

data={'Location':[1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4]}

df = pd.DataFrame(data=data)
df

    Location
0   1
1   1
2   1
3   2
4   2
5   2
6   2
7   2
8   2
9   3
10  3
11  3
12  3
13  3
14  3
15  4
16  4
17  4

In addition, I have the following dict:

Unlock={
1:"A",
2:"B",
3:"C",
4:"D",
5:"E",
6:"F",
7:"G",
8:"H",
9:"I",
10:"J"
}

I'd like to create another column that will randomly select a string from the 'Unlock' dict based on the condition that Location<=Unlock. So for example - for Location 2 some rows will get 'A' and some rows will get 'B'.

I've tried to do the following but with no luck (I'm getting an error) -

df['Name']=np.select(df['Location']<=Unlock,np.random.choice(Unlock,size=len(df))

Thanks in advance for your help!

Naor
  • 119
  • 5

2 Answers2

0

You can convert your dictionary values to a list, and randomly select the values of a subset of this list: only up to Location number of elements.

With Python versions >= 3.7, dict maintains insertion order. For lower versions - see below.

lst = list(Unlock.values())

df['Name'] = df['Location'].transform(lambda loc: np.random.choice(lst[:loc]))

Example output:

    Location Name
0          1    A
1          1    A
2          1    A
3          2    B
4          2    B
5          2    B
6          2    B
7          2    A
8          2    B
9          3    B
10         3    B
11         3    C
12         3    C
13         3    C
14         3    B
15         4    A
16         4    C
17         4    D

If you are using a lower version of Python, you can Build a list of dictionary values, sorted by key:

lst = [value for key, value in sorted(Unlock.items())]
Vladimir Fokow
  • 3,728
  • 2
  • 5
  • 27
  • You should mention that dictionaries do not always preserve insertion order, so with some specific python versions your approach might fail – Riccardo Bucco Aug 31 '22 at 11:56
0

For a vectorial method, multiply by a random value (0,1] and ceil, then map with your dictionary.

This will give you an equiprobable value between 1 and the current value (included):

import numpy as np
df['random'] = (np.ceil(df['Location'].mul(1-np.random.random(size=len(df))))
                  .astype(int).map(Unlock)
               )

output (reproducible with np.random.seed(0)):

    Location random
0          1      A
1          1      A
2          1      A
3          2      B
4          2      A
5          2      B
6          2      A
7          2      B
8          2      B
9          3      B
10         3      C
11         3      B
12         3      B
13         3      C
14         3      A
15         4      A
16         4      A
17         4      D
mozway
  • 194,879
  • 13
  • 39
  • 75