0

Need some help on processing data inside a pandas dataframe I want to convert the form of "col1" into the form of "col2", what should I do?

import pandas as pd
a = {'col1' : ['0.11K','1011K','0.12M','0','0.3','0.02'],'col2':['110','1011000','120000','0','300000','20000']}

df = pd.DataFrame(a , columns=['col1','col2'])

df['col'] = (df['col'].replace(r'[KM]+$', '', regex=True).astype(float) * \
df['col'].str.extract(r'[\d\.]+([KM]+)', expand=False).fillna(1).replace(['K','M'],[10**3,10**6]).astype(int))
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
Preamm
  • 71
  • 1
  • 1
  • 7

1 Answers1

2

If no unit means M then change .fillna(10**6) instead fillna(1) and processing column col1 instead col:

df['col'] = (df['col1'].replace(r'[KM]+$', '', regex=True).astype(float) * 
             df['col1'].str.extract(r'[\d\.]+([KM]+)', expand=False)
                       .fillna(10**6)
                       .replace(['K','M'],[10**3,10**6]).astype(int))

print (df)
    col1     col2        col
0  0.11K      110      110.0
1  1011K  1011000  1011000.0
2  0.12M   120000   120000.0
3      0        0        0.0
4    0.3   300000   300000.0
5   0.02    20000    20000.0

Your solution from Convert the string 2.90K to 2900 or 5.2M to 5200000 in pandas dataframe:

df['col'] = (df['col1'].replace(r'[KM]+$', '', regex=True).astype(float) * 
             df['col1'].str.extract(r'[\d\.]+([KM]+)', expand=False)
                       .fillna(1)
                       .replace(['K','M'],[10**3,10**6]).astype(int))

print (df)
    col1     col2         col
0  0.11K      110      110.00
1  1011K  1011000  1011000.00
2  0.12M   120000   120000.00
3      0        0        0.00
4    0.3   300000        0.30
5   0.02    20000        0.02
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252