1

I have a df called 'XLK':

       Market Cap   PE  
AAN     3.25B      23.6 
AAPL    819.30B    18.44    
ACFN    6.18M      2.1  
ACIW    2.63B      103.15   

I just want the market cap for values > 100 million, so expected output is:

       Market Cap   PE  
AAN     3.25B      23.6 
AAPL    819.30B    18.44    
ACIW    2.63B      103.15   

I've tried converting the letters to the appropriate 0's with no success:

XLK['Market Cap'].replace('M','000000')
XLK.drop[XLK_quote['Market Cap'] < '100M'].index
cs95
  • 379,657
  • 97
  • 704
  • 746
thomas.mac
  • 1,236
  • 3
  • 17
  • 37

2 Answers2

5

Use replace with regex=True and use replacement strings that emulate scientific notation. Then use astype(float) or pd.to_numeric.

df[df.Market_Cap.replace(dict(B='E9', M='E6'), regex=True).astype(float) >= 100E6]

     Market_Cap      PE
AAN       3.25B   23.60
AAPL    819.30B   18.44
ACIW      2.63B  103.15

Equivalently

dct = dict(B='E9', M='E6')
num = pd.to_numeric(df.Market_Cap.replace(dct, regex=True), 'coerce')
df[num >= 100E6]
piRSquared
  • 285,575
  • 57
  • 475
  • 624
2

Alternatively, specify a mapping and then substitute with str.map:

In [723]: mapping
Out[723]: {'B': 1000000000, 'K': 1000, 'M': 1000000}

In [724]: df[df['Market Cap'].str[:-1].astype(float) * df['Market Cap'].str[-1].map(mapping) > 100e6]
Out[724]: 
     Market Cap      PE
AAN       3.25B   23.60
AAPL    819.30B   18.44
ACIW      2.63B  103.15
cs95
  • 379,657
  • 97
  • 704
  • 746