0

I have a dataframe to which I would like to add another column 'SOC' and set a value based on it's existence in a list (trisomy).

unmapped_rnd5

    id      label                                           subclass_of
8   C531834 toxocara canis infection (canine roundworms)    DX90000
17  C535364 chromosome 1, duplication 1p21 p32              D014314
18  C535365 chromosome 2, trisomy 2p13 p21                  D014314
19  C535366 chromosome 2, trisomy 2pter p24                 D014314


unmapped_rnd5["SOC"] = ""     
trisomy = ['C535364','C535365','C535366']

for i, row in unmapped_rnd5.iterrows():
    if row['id'] in trisomy:
        unmapped_rnd5.iloc[i,'SOC'] = 'Congenital, familial and genetic disorders'
    else:
        unmapped_rnd5.iloc[i,'SOC'] = ''
        pass

However I get following error:

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

The expected output is:

unmapped_rnd5
    id      label                                           subclass_of    SOC
8   C531834 toxocara canis infection (canine roundworms)    DX90000        
17  C535364 chromosome 1, duplication 1p21 p32              D014314        Congenital, familial and genetic disorders
18  C535365 chromosome 2, trisomy 2p13 p21                  D014314        Congenital, familial and genetic disorders
19  C535366 chromosome 2, trisomy 2pter p24                 D014314        Congenital, familial and genetic disorders
rshar
  • 1,381
  • 10
  • 28
  • 2
    You dont need the for loop, instead try - `unmapped_rnd5['SCO']=np.where(unmapped_rnd5['id'].isin(trisomy), "Congenital, familial and genetic disorders", "")` – Redox Feb 22 '23 at 09:51

0 Answers0