0

I want to replace the column 'GeneID' with my dictionary.

This creates just nan's.. Does someone know why this happens?

 df
Out[107]: 
        Region     GeneID  DistanceValue
0           BG   79677107            0.0
       ...        ...            ...
1097355  CERus       1415            NaN
[1097360 rows x 3 columns]

replacing it with final_dictionary1

df["GeneID"] = df["GeneID"].map(final_dictionary1)

whereas final_dictionary1 looks like:

...
 '52856': 'Mtg2',
 '19886': 'Ros1',
 '16008': 'Igfbp2',
 '14747': 'Cmklr1',
 '13401': 'Dmwd',
 '12545': 'Cdc7',
 '28113': 'Tinf2',
 '71833': 'Dcaf7',
 ...}

Is it because the numbers are no strings? Can I see what in df.GeneID is stored? If it is a number or a string.. Could this cause the error? Or why isn't it replacing it properly?

Community
  • 1
  • 1
Anja
  • 345
  • 5
  • 21

1 Answers1

1

Possible problem is traling whitespaces, remove them by str.strip(), another problem is value from list not exist in column GeneID, so created NaNs:

df["GeneID"] = df["GeneID"].str.strip().map(final_dictionary1)

Or if possible some values not matched and need no replacement of this values:

df["GeneID"] = df["GeneID"].str.strip().replace(final_dictionary1)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252