0

I have a problem when counting the number of items in a pandas string series when there is no sting in a row.

I´m able to count the number of words when there are one ore more items per row. But, if the row has no value (it´s an empty string when running pd.['mytext'].str.split(',')), I´m getting also one.

These answers are not working for me Answer 1 to a solution which gives one for an empty string Answer 2 to a solution which gives one for an empty string.

How can I handle this in a pandas one liner? Thanks in advance.

Taken the example from the first answer:

df = pd.DataFrame(['one apple','','box of oranges','pile of fruits outside', 'one banana', 'fruits'])
df.columns = ['fruits']

The verified answer was

count = df['fruits'].str.split().apply(len).value_counts()
count.index = count.index.astype(str) + ' words:'
count.sort_index(inplace=True)
count

Which gives

Out[13]:  
0 words:    1
1 words:    1
2 words:    2
3 words:    1
4 words:    1
Name: fruits, dtype: int64

I want a zero for the second string but every solution tried gave me a one.

Mikey
  • 13
  • 4
  • No, it does not. Please explain how this happens. Provide a [mcve]. – cs95 Oct 21 '18 at 19:27
  • Done, the example above gives a small overview. – Mikey Oct 21 '18 at 19:36
  • I'm thinking you misunderstood what was being output. The output is telling you the number of rows with N words. (1 row has 0 words, 1 row has 1 word, 2 rows have 2 words, 1 row has 3 words, and so on). Do you want the number of words per row instead? – cs95 Oct 21 '18 at 19:40
  • Yes, but Martyna had already a solution to my problem. Thanks anyway :) – Mikey Oct 21 '18 at 19:58
  • I'm sure they did. But that isn't a great solution, unfortunately. – cs95 Oct 21 '18 at 20:35
  • Can you suggest a better solution? I´m open for any improvements. – Mikey Oct 22 '18 at 07:35

3 Answers3

1

Use str.split and count the elements with str.len:

df['wordcount'] = df.fruits.str.split().str.len()
print(df)
                   fruits  wordcount
0               one apple          2
1                                  0
2          box of oranges          3
3  pile of fruits outside          4
4              one banana          2
5                  fruits          1

Replace ' ' with ',' for your actual data.

cs95
  • 379,657
  • 97
  • 704
  • 746
  • This gives me ones where I have empty strings. Maybe I was giving the wrong example from the other thread. My dataframe looks like df = pd.DataFrame(['banana', 'apple', '', 'another,tropical,fruit']) Strings in each row, separated by commas and sometimes empty strings. – Mikey Oct 22 '18 at 12:17
  • @Mikey how would it give 1? I just showed you an example in my data where it shows 0 in row with index 1. Can you please explain clearly why that does not work? I also told you to change the delimiter from space to comma for your problem. – cs95 Oct 22 '18 at 15:48
  • I did change the delimiter. Please can you try this example: test = pd.DataFrame(['banana', 'apple', '', 'another,tropical,fruit']) test.columns =['text'] test.text.str.split(',').str.len() it gives me a one for the third entry where I want a zero – Mikey Oct 23 '18 at 12:09
0

When you use split() empty string returns empty list, however when you use split(',') empty string returns list with empty string. This is why the example is not working with your solution.

You can try something as below: First you split string by comma as based on your example I assume this is your case. Then if split returns list with empty string function returns 0, otherwise returns length of list with words.

pd.Series(['mytext', '']).str.split(',').apply(lambda x: 0 if x==[''] else len(x))

Martyna
  • 212
  • 1
  • 7
0

In your question, you're referring to str.split(','), but the examples are for str.split(). The function has different behaviour based on whether you have an argument.

Which are you actually trying to do?

  • My strings are separated by a comma. There is no blank between the words. That´s why I referred to my my split version. – Mikey Oct 21 '18 at 19:53