3

I have following Datarame

df = pd.DataFrame({
    'col_1': [0, 1, 2, 3],
    'col_2': [4, 5, 6, 7],
    'col_3': [14, 15, 16, 19]
})

I try to convert the numeric to string, and then combine each row into one string

I can achieve this by using :

df.apply(lambda x : ''.join(x.astype(str)),1) 

Out[209]: 
0    0414
1    1515
2    2616
3    3719
dtype: object# notice here dtype is object

This is the question

Then , I try to using sum

df.astype(str).sum(1)
Out[211]: 
0     414.0
1    1515.0
2    2616.0
3    3719.0
dtype: float64

Notice here the dtype become float not object.


Here is more information :

df.astype(str).applymap(type)
Out[221]: 
           col_1          col_2          col_3
0  <class 'str'>  <class 'str'>  <class 'str'>
1  <class 'str'>  <class 'str'>  <class 'str'>
2  <class 'str'>  <class 'str'>  <class 'str'>
3  <class 'str'>  <class 'str'>  <class 'str'>

Why sum have this wired behavior? Is there any way to block it convert str back to float ?

Thanks for your help :-)

BENY
  • 317,841
  • 20
  • 164
  • 234

2 Answers2

2

If you want to use some, you can try this way:

df.astype(str).apply(lambda x: x.sum(),1)

Output:

0    0414
1    1515
2    2616
3    3719
dtype: object
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • Great this work also, But Just do not know why my approach did not work ...confused me a lot ... – BENY Oct 15 '17 at 04:02
  • Yeah, there is some casting done in the DataFrame method which appears to be different in the Series method. – Scott Boston Oct 15 '17 at 04:03
  • Basically, by only using `sum`, it do paste the str `number` together and then convert it back to numeric ..., I try to find a way block this automatic process but failed ... – BENY Oct 15 '17 at 04:05
  • Even df.col_1.astype(str).sum() works as you expected. – Scott Boston Oct 15 '17 at 04:06
  • It must be something passed when doing the pd.DataFrame.sum method versus doing a pd.Series.sum. – Scott Boston Oct 15 '17 at 04:07
  • Yes , I try it , it work as what we expected.PS: By reading through the pandas's file , I can not find where it mentioned `DataFrame` `sum` have this behavior ... – BENY Oct 15 '17 at 04:12
  • @Wen maybe my answer helps – Bharath M Shetty Oct 15 '17 at 04:57
  • @Bharathshetty case solved , kindly check the duplicated question .. seems there is no way for us to block it . I hate that `try`.... – BENY Oct 15 '17 at 05:49
  • As i said it will convert it to number if there are only numeric type. Hope my answer says it in brief. The other answer is so in depth – Bharath M Shetty Oct 15 '17 at 05:52
2

Sum didn't work because while returning the series since there are only numbers it converted to respective float dtype format. It will be object only if it is mixed datatype when applying standard functions.

For example when you do

df = pd.DataFrame({
    'col_1': [0, 1, 2, 3],
    'col_2': [4, 5, 6, 7],
    'col_3': [14, 15, 16, 'b']
})

df.astype(str).sum(1)

Output:

  
0    0414
1    1515
2    2616
3     37b
dtype: object

One alternative for doing sum is with cumsum so dtype will be preserved i.e

s = df.astype(str).cumsum(1).iloc[:,-1]

Output:

0    0414
1    1515
2    2616
3    3719
Name: col_3, dtype: object

Hope it helps

Bharath M Shetty
  • 30,075
  • 6
  • 57
  • 108
  • It help me to know a new way for this `cumsum` great , but why pandas dataframe have this wired behavior ... – BENY Oct 15 '17 at 05:02
  • As I said after aggregation series will be returned. When the series is completely numbers it converts the dtype to float. May be a bug need to go through docs for more info – Bharath M Shetty Oct 15 '17 at 05:03
  • Kindly try `df.col_1.astype(str).sum()` – BENY Oct 15 '17 at 05:04
  • Here `series is not returned` but a string so no typecasting. Maybe whenever function returns series it tries to typecast if no errors on typecasting then typecasted series will be returned else the original series. Hope it clarifies your doubt – Bharath M Shetty Oct 15 '17 at 05:04
  • I reading through the pandas' help file ...and I can not find a word mention about this procedure , Have any idea how to block this wired process ? – BENY Oct 15 '17 at 05:12
  • To be honest Im searching for preserving the dtype the whole time. Will surely update the solution once Im successful in finding that – Bharath M Shetty Oct 15 '17 at 05:12
  • I just want to find a block to stop this wired behavior that `try` make the mission impossible – BENY Oct 15 '17 at 05:55
  • For speed we can cheat it by adding some string to end of the dframe and select everything except the last one. But it is bit of hard word. – Bharath M Shetty Oct 15 '17 at 05:58
  • I just do not understand why they add the try except at the end ... – BENY Oct 15 '17 at 05:59
  • Maybe because pandas usually takes data from external text source in real life and since the case of adding numbers as string is less so they might have added `try` and `except`.. Its better we post a feature request. – Bharath M Shetty Oct 15 '17 at 06:01
  • Sure , will discuss in the link answer with PiR and accept answer holder there. – BENY Oct 15 '17 at 06:04