2

Contemplating a few options to understand which one would be the best for the below scenario:

I have a dataframe in pandas that looks something like

CODE NAME COL1 COL2 COL3
IN   INDIA  x    x    z
US   USA    r    s    s
IND  INDIA  f   d     d
RU   RUSSIA g    f    d
USA  USA    d    s    s
IN   INDIA  d    d    a

I would like to create a new column, say POSSIBLE_CODES such that it is a list of all the CODES the corresponding NAME column has see. So my target df should look like:

CODE NAME COL1 COL2 COL3 POSSIBLE_CODES
IN   INDIA  x    x    z  [IN, IND]
US   USA    r    s    s  [US, USA]
IND  INDIA  f   d     d  [IN, IND]
RU   RUSSIA g    f    d  [RU]
USA  USA    d    s    s  [US, USA]
IN   INDIA  d    d    a  [IN, IND]
user3483203
  • 50,081
  • 9
  • 65
  • 94
asimo
  • 2,340
  • 11
  • 29
  • @rahlf23: As you can see from the dataframe, the rows where we have INDIA in the NAME columns have either IN or IND in the CODE columns. Hence the list requiring IN and IND for those rows. Hope that answers ? – asimo Oct 01 '18 at 14:54
  • 1
    Possible duplicate of [Pandas groupby - set of different values](https://stackoverflow.com/questions/47542980/pandas-groupby-set-of-different-values) – Georgy Oct 01 '18 at 14:56

1 Answers1

3

You can create a new Series, where the index is your NAME column, and your values are the possible CODE for each NAME:

s = df.groupby('NAME').CODE.unique().rename('POSSIBLE')

Now using pd.Series.map:

df.assign(POSSIBLE=df.NAME.map(s))

  CODE    NAME COL1 COL2 COL3   POSSIBLE
0   IN   INDIA    x    x    z  [IN, IND]
1   US     USA    r    s    s  [US, USA]
2  IND   INDIA    f    d    d  [IN, IND]
3   RU  RUSSIA    g    f    d       [RU]
4  USA     USA    d    s    s  [US, USA]
5   IN   INDIA    d    d    a  [IN, IND]
user3483203
  • 50,081
  • 9
  • 65
  • 94