1

I have this dataframe:

id1 id2
2341 qw123
2321 -
- de121
2341 qd111

And I want to add 3rd column id3 with randomly generated ids in a list:

['11231', '123141', '234512']

The thing that makes it difficult to me is how to attach the same random id from the list to each row where id1 is the same.

For example the output file should look like this:

id1 id2 id3
2341 qw123 11231
2321 - 123141
- de121 234512
2341 qd111 11231

Any solution is appreciated!

Y.C.T
  • 125
  • 12
  • `df['id3'] = np.random.choice(yourlist,len(df))` – anky May 21 '21 at 08:13
  • @anky thanks for the replay, but what I need is to choose the same random id when id1 is the same, please look at the description again – Y.C.T May 21 '21 at 08:27
  • see how groupby works and create a function with the same logic to call with apply – anky May 21 '21 at 08:41

1 Answers1

1

You can create a dict for mapping the unique id1 keys to the random numbers. Then use .map() to map id1 values to these random numbers for assignment to new column id3, as follows:

num_list = ['11231', '123141', '234512']
id1_unique = df['id1'].unique()

m_dict = dict(zip(id1_unique,  np.random.choice(num_list, len(id1_unique))))

df['id3'] = df['id1'].map(m_dict)
SeaBean
  • 22,547
  • 3
  • 13
  • 25
  • Thanks this is working, but I also need to add random id, when `id1` column is empty. Currently with your code for each such case I have the same random id for each `id2` value. I tried to get the unique values from both columns and map them using `applymap` , but its not working properly. Do you have an idea how this code can be extended for this case? – Y.C.T May 21 '21 at 09:28
  • @Y.C.T Your list of 3 elements is too small so that sometimes you get same values for different `id's` Try make a bigger list for better randomness. As you have to assign the same number for the same id, the algorithm in some sense have to rely on id. Hence, even using groupby(), you can't get the numbers when you groupby an empty item. – SeaBean May 21 '21 at 09:56
  • @Y.C.T For the case of depending on both columns, I can't get it fully what you want to do without some sample data. I would suggest you post another question with some samples of what you want to do with 2 columns and empty id. It's better to break down multiple problems into different questions otherwise there is chance people will close your questions stating that your question is not focused. – SeaBean May 21 '21 at 10:00
  • @Y.C.T Anyway, please accept the solution for this question if it works for you based on your initial scope of question. I will further look into your new question if I can. – SeaBean May 21 '21 at 10:01