0

I'm looking for help for a way to loop through pandas DF changing the rows from the current presented object datatype of for example '1.15m' to 1150000 and also change the datatype to an integer.

This is what I have so far but it doesnt seem to be picking up the 'm' in the object.

int_cols = ['Avg. Likes', 'Posts', 'New Post Avg. Likes','Total Likes' ]

for c in int_cols:
    if 'm' in db[c]:
        db[c] = db[c].apply(lambda x: float(x.strip('m'))*1000000)
        db[c] = db[c].astype('int')
    elif 'k' in db[c]: 
        db[c] = db[c].apply(lambda x: float(x.strip('k'))*1000)
        db[c] = db[c].astype('int')
    elif 'b' in db[c]: 
        db[c] = db[c].apply(lambda x: float(x.strip('b'))*1000000000)
        db[c] = db[c].astype('int')
    else:
        continue

Edit: adding sample data

db.head(3)

|Rank | Channel Info | Influence Score  | Followers | Avg. Likes | Posts  |60-Day Eng Rate  | New Post Avg. Likes | Total Likes  | Country Or Region|
|:---:|:------------:|:----------------:|:---------:|:----------:|:------:|:---------------:|:-------------------:|:------------:|:----------------:|                  
|1    | cristiano    |92                |485200000.0|8.7m        | 3.4k   |0.013            |6.3m                 |29.1b         |Spain             |
|2    | kyliejenner  |91                |370700000.0|8.2m        | 7.0k   |0.014            |5.0m                 |57.4b         |United States     |
|3    | leomessi     |90                |363900000.0|6.7m        | 915    |0.010            |3.5m                 |6.1b          |NaN               |
  • Try putting the `if 'm' in ...` test inside the lambda, so it tests each object, not the Series. – sj95126 Oct 22 '22 at 14:00
  • Your question needs a minimal reproducible example consisting of sample input, expected output, actual output, and only the relevant code necessary to reproduce the problem. See [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) for best practices related to Pandas questions. – itprorh66 Oct 22 '22 at 14:03
  • You should write `'m' in list(db[c]):`. – msamsami Oct 22 '22 at 14:06
  • Pls post sample of your data – gtomer Oct 22 '22 at 14:29
  • **DO NOT** post images of code, links to code, data, error messages, etc. - copy or type the text into the question. – itprorh66 Oct 22 '22 at 15:53

1 Answers1

0

here is one way to do it

int_cols = ['Avg. Likes', 'Posts', 'New Post Avg. Likes','Total Likes' ]
int_cols

# create a mapping of the suffixes for multipiers
m={'m': 1000000.0, 'k': 1000.0, 'b': 1000000000.0}
m

# remove digits and map to the dictionary
# then multiply with the numeric part
df[int_cols] = (df[int_cols].apply(lambda x: 
                                   (x.replace('[\d\.]','' , regex=True).map(m).fillna(1.).mul( 
                                    x.replace('[m|b|k]','', regex=True).fillna(1.).astype(float))) 
                                   , axis=1))
df
Avg. Likes  Posts   New Post Avg. Likes     Total Likes
0   8.7     3.4     6.3     29.1
1   8.2     7.0     5.0     57.4
2   6.7     915.0   3.5     6.1
3   6.1     1.9     1.7     11.4
4   1.8     6.8     932.0   12.6
...     ...     ...     ...     ...
195     680.6   4.6     305.7   3.1
196     2.2     1.4     2.1     3.0
197     227.8   4.2     103.2   955.9
198     193.3   865.0   82.6    167.2
199     382.5   3.8     128.2   1.5
Naveed
  • 11,495
  • 2
  • 14
  • 21
  • Hi Naveed, thank you for your answer. I apologize as my previous sample data didnt show this but some of the data are not shown in "m","k" or "b". using this code returns those back as "NaN" is there a way to change that so it leaves those unchanged? – LeveragedDev Oct 22 '22 at 20:26
  • @LeveragedDev, solution updated. null result from replace is filled with 1.0. Hope it helps – Naveed Oct 22 '22 at 21:42