4
id   string   
0    31672;0           
1    31965;0
2    0;78464
3      51462
4    31931;0

Hi, I have that table. i would like to split the string table by the ';', and store it to the new column. the final column shold be like this

 id   string   word_count
0    31672;0    2       
1    31965;0    2
2    0;78464    2
3      51462    1
4    31931;0    2

it would be nice if someone knows how to do it with python.

cs95
  • 379,657
  • 97
  • 704
  • 746
al1991
  • 81
  • 2
  • 8
  • Are you looking for `df['string'].str.count(';') + 1`? – cs95 Dec 25 '17 at 17:06
  • Hi, thanks for the response. but that is not what i'm looking for. that code will write "1" to the "word_count" column if the "string" column value is an empty string :) – al1991 Dec 25 '17 at 17:11

1 Answers1

9

Option 1
The basic solution using str.split + str.len -

df['word_count'] = df['string'].str.split(';').str.len()
df

     string  word_count
id                     
0   31672;0           2
1   31965;0           2
2   0;78464           2
3     51462           1
4   31931;0           2

Option 2
The clever (efficient, less space consuming) solution with str.count -

df['word_count'] = df['string'].str.count(';') + 1
df

     string  word_count
id                     
0   31672;0           2
1   31965;0           2
2   0;78464           2
3     51462           1
4   31931;0           2

Caveat - this would ascribe a word count of 1 even for an empty string (in which case, stick with option 1).


If you want each word occupying a new column, there's a quick and simple way using tolist, loading the splits into a new dataframe, and concatenating the new dataframe with the original using concat -

v = pd.DataFrame(df['string'].str.split(';').tolist())\
        .rename(columns=lambda x: x + 1)\
        .add_prefix('string_')

pd.concat([df, v], 1)

     string  word_count string_1 string_2
id                                       
0   31672;0           2    31672        0
1   31965;0           2    31965        0
2   0;78464           2        0    78464
3     51462           1    51462     None
4   31931;0           2    31931        0
cs95
  • 379,657
  • 97
  • 704
  • 746