3

Suppose I have this dataframe df:

column1      column2                                            column3
amsterdam    school yeah right backtic escapes sport swimming   2016
rotterdam    nope yeah                                          2012
thehague     i now i can fly no you cannot swimming rope        2010
amsterdam    sport cycling in the winter makes me               2019

How do I get the sum of all characters (exclude white-space) of each row in column2 and return it to new column4 like this:

column1      column2                                            column3    column4
amsterdam    school yeah right backtic escapes sport swimming   2016       70
rotterdam    nope yeah                                          2012       8
thehague     i now i can fly no you cannot swimming rope        2010       65
amsterdam    sport cycling in the winter makes me               2019       55

I tried this code but so far in return I got the sum of all characters of every row in column2:

df['column4'] = sum(list(map(lambda x : sum(len(y) for y in x.split()), df['column2'])))

so currently my df look like this:

column1      column2                                            column3    column4
amsterdam    school yeah right backtic escapes sport swimming   2016          250
rotterdam    nope yeah                                          2012           250
thehague     i now i can fly no you cannot swimming rope        2010           250
amsterdam    sport cycling in the winter makes me               2019           250

anybody have idea?

  • you might want to change the expected output as it is misleading. Doesn't seem correct – anky Jan 24 '20 at 07:10

3 Answers3

3

Use custom lambda function with your solution:

df['column4'] = df['column2'].apply(lambda x: sum(len(y) for y in x.split()))

Or get count of all values and subtract count of whitespaces by Series.str.count:

df['column4'] = df['column2'].str.len().sub(df['column2'].str.count(' '))
#rewritten to custom functon
#df['column4'] = df['column2'].map(lambda x: len(x) - x.count(' '))
print (df)
     column1                                           column2  column3  \
0  amsterdam  school yeah right backtic escapes sport swimming     2016   
1  rotterdam                                         nope yeah     2012   
2   thehague       i now i can fly no you cannot swimming rope     2010   
3  amsterdam              sport cycling in the winter makes me     2019   

   column4  
0       42  
1        8  
2       34  
3       30  
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

Hi This works for me,

import pandas as pd
df=pd.DataFrame({'col1':['Stack Overflow','The Guy']})
df['Count Of Chars']=df['col1'].str.replace(" ","").apply(len)
df

Output

    col1    Count Of characters
0   Stack Overflow  13
1   The Guy          6
The Guy
  • 411
  • 4
  • 11
1

You can use the method count with a regular expression pattern:

df['column2'].str.count(pat='\w')

Output:

0    42
1     8
2    34
3    30
Name: column2, dtype: int64

Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73