in python how to count how many times certain words appear without specifying the word

Question

lets say i have the following text file. Let's say each color name is an account name and i want to know how many person are under it. all the account names come after a "/" or a "-". There are 3 accounts in the file I shared. It's the first word comes after "Color: ". So there are 3 accounts here. red, blue, and black. So, red/test/base, red-img-tests, red-zero-tests, and red-replication-tests are all part of account "red". And then I have to finally say how many of the person are there under red. So here it's red : 4.

---------------------------------
Color: red/test/base
  person: latest
---------------------------------
Color: red-img-tests
  person: latest
---------------------------------
Color: red-zero-tests
  person: latest
---------------------------------
Color: red-replication-tests
  person: latest
---------------------------------
Color: blue
  person: latest
---------------------------------
Color: black/red-config-img
  person: 7e778bb
  person: 82307b2
  person: 8731770
  person: 7777aae
  person: 081178e
  person: c01ba8a
  person: 881b1ad
  person: d2fb1d7
---------------------------------
Color: black/pasta
  person: latest
---------------------------------
Color: black/base-img
  person: 0271332
  person: 70da077
  person: 3700c07
  person: c2f70ff
  person: 0210138
  person: 083af8d

  person: latest
---------------------------------
Color: black/food-pasta-8.0
  person: latest

my output will be:

    red: 4
    blue: 1
    black: 17

I have thousands of line so as you can see, i can't really specify the words like 'red' or 'blue'... it has to somehow read each of them and see if they are the same as the following line.

for now i am doing the following to get the account names out.

import re
for line in f.readlines():#gives array of lines
    acc_name = re.split('; |, |\/|\-|\:', line)[1].strip()

@ParagJain updated. can you check plz. – pandaflieszeppelin Sep 26 '19 at 14:09 — pandaflieszeppelin, Sep 26 '19 at 14:09

Pitto · Accepted Answer · 2019-09-27T08:08:44.390

3

I have a solution using Counter for you:

import collections

data = """
---------------------------------
Color: red/test/base
  person: latest
---------------------------------
Color: red-img-tests
  person: latest
---------------------------------
Color: red-zero-tests
  person: latest
---------------------------------
Color: red-replication-tests
  person: latest
---------------------------------
Color: blue
  person: latest
---------------------------------
Color: black/red-config-img
  person: 7e778bb
  person: 82307b2
  person: 8731770
  person: 7777aae
  person: 081178e
  person: c01ba8a
  person: 881b1ad
  person: d2fb1d7
---------------------------------
Color: black/pasta
  person: latest
---------------------------------
Color: black/base-img
  person: 0271332
  person: 70da077
  person: 3700c07
  person: c2f70ff
  person: 0210138
  person: 083af8d
  """

print (data)
colors = ["black", "red", "blue"]
final_count = []
for line in data.split("\n"):
    for color in colors:
        if color in line:
            final_count.append(color)
            #break # Uncomment this break if you don't want to count
            # two colors in the same line
final_count = collections.Counter(final_count)
print(final_count)

Output

Counter({'blue': 1, 'black': 3, 'red': 5})

Here's the link to Python official documentation and a quick reference:

This module implements specialized container datatypes providing alternatives to Python’s general purpose built-in containers, dict, list, set, and tuple.

edited Sep 27 '19 at 08:08

answered Sep 26 '19 at 07:33

Pitto

8,229
3
42
51

OP may have to `print(collections.Counter(s.split()))` assuming `s` is the multi-line string posted above. – shahkalpesh Sep 26 '19 at 07:36
@shahkalpesh doing that i get something like Counter({'apple': 1}) Counter({'apple': 1}) Counter({'apple': 1}) Counter({'red': 1}) Counter({'red': 1}) Counter({'green': 1}) Counter({'green': 1}) Counter({'green': 1}) Counter({'green': 1}) Counter({'green': 1}) Counter({'green': 1}) Counter({'black': 1}) – pandaflieszeppelin Sep 26 '19 at 07:54
Please share your input data, @pandaflieszeppelin – Pitto Sep 26 '19 at 08:39
@Pitto can you check i updated – pandaflieszeppelin Sep 26 '19 at 14:09
I've updated the code, @pandaflieszeppelin – Pitto Sep 26 '19 at 14:19
1

@Pitto it just gives 1s... sort of like how i replied to shahkalpesh – pandaflieszeppelin Sep 26 '19 at 14:29
I've updated the code, re-tested it and added the output. – Pitto Sep 26 '19 at 14:49
@Pitto my data is a list though. the txt file read as lines which is a list. the split doesnt work with lists – pandaflieszeppelin Sep 26 '19 at 21:04
Changed my answer a bit, please check it @pandaflieszeppelin – Pitto Sep 27 '19 at 08:09

Basavaraju US · Answer 2 · 2019-09-26T07:39:13.543

0

count = {}

example = "apple apple apple apple red red green green green green green black"

for i in example.split():
    if i not in count:
        count[i] = 1
    elif i in count:
        count[i] += 1


print(count)

edited Sep 26 '19 at 07:39

answered Sep 26 '19 at 07:33

Basavaraju US

134
5

@pandaflieszeppelin updated the answer, please check it – Basavaraju US Sep 26 '19 at 07:39
1

While the solution may be correct now, there are may quite a few optimizations possible. For example using `defaultdict(int)` and not checking for dict-membership in the `elif` part. The best option is to use `Counter` as other answers have already pointed out. – rdas Sep 26 '19 at 07:44

score 0 · Answer 3 · answered Sep 26 '19 at 07:36

You can use Counter() from the inbuilt package Collections Read about Counter() in Python 3.x here

from collections import Counter
data = "apple apple apple apple red red green green green green green black"
d = Counter(data.split())

print(d)

Dictionaries have the speciality that it doesn't store the duplicate value, so you get to get the count using this medium.

in python how to count how many times certain words appear without specifying the word

3 Answers3