0

I have the following data

[[4, 'ABC'], [4, 'BCD'], [3, 'CDE'], [3, 'ABC'], [3, 'DEF'], [3, 'BCD'], [3, 'BCD'], [3, 'BCD']]

I need the following output

[ABC, 2, 7]
[BCD, 4, 13]
[CDE, 1, 3]
[DEF, 1, 3]

I need to count the number of words as position [1] and sum the number for that word at position [0]. The result is

[Word, freq, sum of weight] 

I check the finding frequencies of pair items in a list of pairs and Finding frequency distribution of a list of numbers in python but they could not solve my problem.

I tried this but no success

res = [[4, 'ABC'], [4, 'BCD'], [3, 'CDE'], [3, 'ABC'], [3, 'DEF'], [3, 'BCD'], [3, 'BCD'], [3, 'BCD']]
 d = {}
for freq, label in res:
    if label not in d:
        d[label] = {}
    inner_dict = d[label]
    if freq not in inner_dict:
        inner_dict[freq] = 0
    inner_dict[freq] += freq

print(inner_dict)
Dr. Abrar
  • 327
  • 2
  • 5
  • 17

5 Answers5

5

With pandas:

import pandas
data = [[4, 'ABC'], [4, 'BCD'], [3, 'CDE'], [3, 'ABC'], [3, 'DEF'], [3, 'BCD'], [3, 'BCD'], [3, 'BCD']]
df = pandas.DataFrame(data, columns=['count', 'word'])
result = df.groupby('word')['count'].agg((len, sum))

Result:

       len sum
word
ABC      2   7
BCD      4  13
CDE      1   3
DEF      1   3

To sort the result, use sort_values:

result.sort_values(['sum', 'len']):

      len  sum
word
CDE     1    3
DEF     1    3
ABC     2    7
BCD     4   13
chthonicdaemon
  • 19,180
  • 2
  • 52
  • 66
3

Try this:

data = [[4, 'ABC'], [4, 'BCD'], [3, 'CDE'], [3, 'ABC'], [3, 'DEF'], [3, 'BCD'], [3, 'BCD'], [3, 'BCD']]

result = {}
for weight, value in data:
    if value not in result:
        result[value] = [1, weight]
    else:
        result[value][0] += 1
        result[value][1] += weight

print(result)

Result:

{'ABC': [2, 7], 'BCD': [4, 13], 'CDE': [1, 3], 'DEF': [1, 3]}
Antwane
  • 20,760
  • 7
  • 51
  • 84
1

You can use simply defaultdict and list comprehension

a = [[4, 'ABC'], [4, 'BCD'], [3, 'CDE'], [3, 'ABC'], [3, 'DEF'], [3, 'BCD'], [3, 'BCD'], [3, 'BCD']]
from collections import defaultdict

d = defaultdict(lambda  : 0)
d2 = defaultdict(lambda : 0)
for i in a:
    d[i[1]] +=1
for i in a :
    d2[i[1]] += i[0]

res =    [ [i, d[i], d2[i]] for i in d.keys() ]

ouput :

[['CDE', 1, 3], ['DEF', 1, 3], ['BCD', 4, 13], ['ABC', 2, 7]]

EDIT : As pointed out by @chthonicdaemon, a simple way to initialize defaultdict is to pass int to initialize it at 0 and str if you need empty strings

Tbaki
  • 1,013
  • 7
  • 12
  • The recommended way is to use `defaultdict(int)` instead of `defaultdict(lambda: 0)`. – chthonicdaemon Jun 16 '17 at 08:17
  • @chthonicdaemon are you sure about that ? i'm on python 3.X and they say that TypeError: first argument must be callable or None when i try. – Tbaki Jun 16 '17 at 08:21
  • I think you didn't type `defaultdict(int)` but instead `defaultdict(0)`? `int` counts as a callable. – chthonicdaemon Jun 16 '17 at 08:23
  • You can see they use it this way in [the examples](https://docs.python.org/3.6/library/collections.html#defaultdict-examples) in the documentation. – chthonicdaemon Jun 16 '17 at 08:25
  • Oh that what you meant sorry ! Do you know why int is better than lambda ? I always used them because it's more gereric @chthonicdaemon – Tbaki Jun 16 '17 at 08:25
  • @chthonicdaemon the documentation say just after : A faster and more flexible way to create constant functions is to use a lambda function which can supply any constant value (not just zero) – Tbaki Jun 16 '17 at 08:26
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/146839/discussion-between-chthonicdaemon-and-tbaki). – chthonicdaemon Jun 16 '17 at 08:27
0

Here you have a functional aproach:

l = [[4, 'ABC'], [4, 'BCD'], [3, 'CDE'], [3, 'ABC'], [3, 'DEF'], [3, 'BCD'], [3, 'BCD'], [3, 'BCD']]
data = itertools.groupby(l, key=lambda x: x[1]))
[(k, len(x), sum(x)) for k, x in map(lambda (x, y): (x, map(lambda x: x[0], list(y))), data)]
[('ABC', 1, 4), ('BCD', 1, 4), ('CDE', 1, 3), ('ABC', 1, 3), ('DEF', 1, 3), ('BCD', 3, 9)]
Netwave
  • 40,134
  • 6
  • 50
  • 93
  • Can you provide a working code and an ouput next time ? Will help give clarity to your answer – Tbaki Jun 16 '17 at 08:11
0

Use the you_dictionary.setdefault(key,[]).append(value) method in case you have multiple values for a key to append them into a list.

a = [[4, 'ABC'], [4, 'BCD'], [3, 'CDE'], [3, 'ABC'], [3, 'DEF'], [3, 'BCD'], [3, 'BCD'], [3, 'BCD']]
my_dict = {}

for item in a:
    key,value=item[1],item[0]
    my_dict.setdefault(key,[]).append(value)
print(my_dict)

my_list = []

for k,v in my_dict.items():
    my_list.append([k,len(v),sum(v)])

print(my_list)

output:

{'BCD': [4, 3, 3, 3], 'DEF': [3], 'CDE': [3], 'ABC': [4, 3]}
[['BCD', 4, 13], ['DEF', 1, 3], ['CDE', 1, 3], ['ABC', 2, 7]]
void
  • 2,571
  • 2
  • 20
  • 35