0

I have a column of strings, where each row is a list of strings. I want to count the elements of the column in its entirety and not just the rows which one gets with the value.counts() in pandas. I want to apply the Counter() from the Collections module, but that runs only on a list. My column in the DataFrame looks like this:

[['FollowFriday', 'Awesome'],
 ['Covid_19', 'corona', 'Notagain'],
 ['Awesome'],
 ['FollowFriday', 'Awesome'],
 [],
 ['corona', Notagain],
....]

I want to get the counts, such as

[('FollowFriday', 2),
 ('Awesome', 3),
 ('Corona', 2),
 ('Covid19'),
 ('Notagain', 2),
 .....]

The basic command that I am using is:

from collection import Counter
Counter(df['column'])

OR

from collections import Counter
Counter(" ".join(df['column']).split()).most_common() 

Any help would be greatly appreciated!

Abir
  • 57
  • 5
  • Use `value_counts` to count values, combine with `head(N)` to get the top N. You should improve the question with clear input if you don't manage – mozway Mar 16 '22 at 11:04
  • Thank you for your answer, but like I explained, value_counts() counts the rows, so I get count = 1 for each of the rows. But, I want to count the different values inside each of the rows and also across all the rows. Is there a way, I could do that with value_counts() ? – Abir Mar 16 '22 at 12:51
  • Please provide a reproducible input and output, as real DataFrame constructor, not lists. Or was it just an analogy to pandas and you want to do it with pure python? – mozway Mar 16 '22 at 12:52
  • It was an analogy. I have a Dataframe column, which as I understand is a series. And, I need to find out the count of all the text (words) in that column, where the text(words) are repeated in an unordered fashion across the rows as in my example – Abir Mar 16 '22 at 13:04
  • I got the above list by using: col_list = df['column'].values.tolist(), thinking that it will solve my problem but it just made a list of the already present lists. – Abir Mar 16 '22 at 13:07

1 Answers1

1

IIUC, your comparison to pandas was only to explain your goal and you want to work with lists?

You can use:

l = [['FollowFriday', 'Awesome'],
     ['Covid_19', 'corona', 'Notagain'],
     ['Awesome'],
     ['FollowFriday', 'Awesome'],
     [],
     ['corona', 'Notagain'],
    ]

from collections import Counter
from itertools import chain

out = Counter(chain.from_iterable(l))

or if you have a Series of lists, use explode:

out = Counter(df['column'].explode())
# OR
out = df['column'].explode().value_counts()

output:

Counter({'FollowFriday': 2,
         'Awesome': 3,
         'Covid_19': 1,
         'corona': 2,
         'Notagain': 2})
mozway
  • 194,879
  • 13
  • 39
  • 75
  • Thank you so much for taking the time to answer. out = Counter(df['column'].explode()) # OR out = df['column'].explode().value_counts() works. – Abir Mar 16 '22 at 14:05
  • You might have a "sublist" that is a float? – mozway Mar 16 '22 at 14:07
  • How can I use .head(N) with the above code? You mentioned it in your earlier comment, but when I use it with value_counts, i.e., df['column'].explode().value_counts().head(N), it says that : name N is not defined. – Abir Mar 16 '22 at 14:08
  • well, define N: `N = 5` – mozway Mar 16 '22 at 14:10
  • I am running into another problem now. When trying to save the output of the counter, I am getting the error " 'str' object has no attribute 'keys' ". Can you please help me fix it? I tried saving it with: import csv csv_columns = ['Counts'] csv_file = "file.csv" try: with open(csv_file, 'w') as csvfile: writer = csv.DictWriter(csvfile, fieldnames=csv_columns) writer.writeheader() for data in output: writer.writerow(data) except IOError: print("I/O error") – Abir Mar 16 '22 at 17:08
  • It's quite difficult to follow code in comments, you should start a new question ;) – mozway Mar 16 '22 at 18:51
  • Have there been any updates to the module? I tried the code that you suggested again today, and it gave me a different answer. The only change that I had in my data is that now, I converted all the values to lowercase. df['column'].explode().value_counts() does not separate out the values anymore. Any ideas, why this might be happening? – Abir Mar 22 '22 at 13:25
  • @Abir no, but python is case sensitive. `ABC` and `abc` won't be counted as the same element ;) – mozway Mar 22 '22 at 13:27
  • I know, that is why I changed it to lowercase as there were some observations that were the same but in lowercase. So, I wanted to add after converting it all to lowercase. – Abir Mar 22 '22 at 13:37
  • Sorry, but as I said last time, debugging without data is quite difficult. You should open a new question will all the details – mozway Mar 22 '22 at 13:38
  • 1
    explode.value_counts() works like a charm! Thanks :) – Prithvi Shetty Apr 27 '23 at 07:34