-2

I have a JSON file that has multiple objects with a text field:

{
"messages": 
[
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:51:00", "agentId": "2001-100001", "skillId": "2001-20000", "agentText": "That customer was great"},
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:55:00", "agentId": "2001-100001", "skillId": "2001-20001", "agentText": "That customer was stupid\nI hope they don't phone back"},
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:57:00", "agentId": "2001-100001", "skillId": "2001-20002", "agentText": "Line number 3"},
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:59:00", "agentId": "2001-100001", "skillId": "2001-20003", "agentText": ""}
]
}

I'm only interested in the 'agentText' field.

I basically need to strip out every word in the agentText field and do a count of the occurrences of the word.

So my python code:

import json

with open('20190626-101200-text-messages.json') as f:
  data = json.load(f)

for message in data['messages']:
    splittext= message['agentText'].strip().replace('\n',' ').replace('\r',' ')
    if len(splittext)>0:
        splittext2 = splittext.split(' ')
        print(splittext2)

gives me this:

['That', 'customer', 'was', 'great']
['That', 'customer', 'was', 'stupid', 'I', 'hope', 'they', "don't", 'phone', 'back']
['Line', 'number', '3']

how can I add each word to an array with counts? so like;

That 2
customer 2
was 2
great 1
..

and so on?

dragonfury2
  • 375
  • 5
  • 20

2 Answers2

1
data = '''{"messages":
[
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:51:00", "agentId": "2001-100001", "skillId": "2001-20000", "agentText": "That customer was great"},
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:55:00", "agentId": "2001-100001", "skillId": "2001-20001", "agentText": "That customer was stupid I hope they don't phone back"},
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:57:00", "agentId": "2001-100001", "skillId": "2001-20002", "agentText": "Line number 3"},
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:59:00", "agentId": "2001-100001", "skillId": "2001-20003", "agentText": ""}
]
}
'''

import json
from collections import Counter
from pprint import pprint

def words(data):
    for m in data['messages']:
        yield from m['agentText'].split()

c = Counter(words(json.loads(data)))
pprint(c.most_common())

Prints:

[('That', 2),
 ('customer', 2),
 ('was', 2),
 ('great', 1),
 ('stupid', 1),
 ('I', 1),
 ('hope', 1),
 ('they', 1),
 ("don't", 1),
 ('phone', 1),
 ('back', 1),
 ('Line', 1),
 ('number', 1),
 ('3', 1)]
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • Doesnt appear to like: c = Counter(words(json.loads(data))) pprint(c.most_common()) Its coming out with red underline for pprint – dragonfury2 Jun 27 '19 at 15:05
1

check this out.

data = {
    "messages": 
        [
            {"timestamp": "123456789", "timestampIso": "2019-06-26 09:51:00", "agentId": "2001-100001", "skillId": "2001-20000", "agentText": "That customer was great"},
            {"timestamp": "123456789", "timestampIso": "2019-06-26 09:55:00", "agentId": "2001-100001", "skillId": "2001-20001", "agentText": "That customer was stupid\nI hope they don't phone back"},
            {"timestamp": "123456789", "timestampIso": "2019-06-26 09:57:00", "agentId": "2001-100001", "skillId": "2001-20002", "agentText": "Line number 3"},
            {"timestamp": "123456789", "timestampIso": "2019-06-26 09:59:00", "agentId": "2001-100001", "skillId": "2001-20003", "agentText": ""}
        ]
}

var = []

for row in data['messages']:
    new_row = row['agentText'].split()
    if new_row:
        var.append(new_row)

temp = dict()

for e in var:
    for j in e:
        if j in temp:
            temp[j] = temp[j] + 1
        else:
            temp[j] = 1

for key, value in temp.items():
    print(f'{key}: {value}')
Bob White
  • 733
  • 5
  • 20
  • this sort of works, however, my 3 messages lines are not separated by commas(,) - they are just one line after the other in the for loop. How do I append one line to another and separate by commas? – dragonfury2 Jun 27 '19 at 15:17
  • data.split() it will help you to separete by commas – Bob White Jun 27 '19 at 15:34
  • look at above I have made changes to my response. using split to separete by commas – Bob White Jun 27 '19 at 15:42