3

I have a data list structure looking like this:

[('a', 1),('a', 2),('b', 0),('b', 1),('c', 0)]

I’m trying to combine the second value of tuple if the first item is same. (And remove the duplicate)

End result should be:

[('a', 3),('b', 1),('c', 0)]

My approach is to create a second empty list and check if first element exist in list, if not then append. Otherwise loop through second list and add value of [1] item in iteration from first list to [1] item in second list. I am unable to get my concept working. If anyone has a more efficient solution I am also open to suggestion.

secondList = []
for item in firstList:
    if (secondList.count(item[0]]):
      secondList.append(item)
    else:
      for item_j in secondList:
        if (item_j[0] == item[0]):
          item_j[1] = item_j[1]+item[1]
Asocia
  • 5,935
  • 2
  • 21
  • 46

5 Answers5

5

You can use itertools.groupby. First group them by the 0th index then for each group sum the values at the 1st index:

from itertools import groupby
from operator import itemgetter
data = [("a", 1),("a", 2),("b", 0),("b", 1),("c", 0)]

result = [(k, sum(item[1] for item in g)) for k, g in groupby(data, key=itemgetter(0))]
print(result)

Output:

[('a', 3), ('b', 1), ('c', 0)]

P.S.: Note that this wouldn't work as you expected if your list wasn't already sorted on the 0th index as said in the documentation:

Generally, the iterable needs to already be sorted on the same key function.

Asocia
  • 5,935
  • 2
  • 21
  • 46
2

You can use a dictionary to get the desired result without importing any extra module:

lst = [('a', 1),('a', 2),('b', 0),('b', 1),('c', 0)]

Dict = {}

for tup in lst:

    first=tup[0]
    second=tup[1]
    if first not in Dict:
        Dict[first]=0
    Dict[first]+=second

secondList = []

for key in Dict.keys():
    secondList.append((key,Dict[key]))

print(secondList)
ksohan
  • 1,165
  • 2
  • 9
  • 23
1

This seems like a pretty good case for a dictionary. With lists, you have to search through the list to find the item you're referring to, O(n). With dictionaries, the search time is O(1).

tuple_dict = {}
for item in firstList:
  key,value = item
  if key in tuple_dict:
    tuple_dict[key]+=value
  else:
    tuple_dict[key]=value

Then you can convert it back to your tuple list if you want

tuple_list = []
for key,value in tuple_dict.items():
  tuple_list.append((key,value))
Ted Brownlow
  • 1,103
  • 9
  • 15
1

The existing answers are good, here's yet another way you could do it using a defaultdict:

from collections import defaultdict

def sum_tuples(tuples):
    result = defaultdict(int)
    for i in tuples:
        result[i[0]] += i[1]
    return [(k, result[k]) for k in result.keys()]
crumb
  • 76
  • 1
  • 6
1

import pandas as pd

data = [("a", 1),("a", 2),("b", 0),("b", 1),("c", 0)]

df = pd.DataFrame( data , columns=['c1','c2'] )

x = tuple ( df.groupby( 'c1' ).sum().to_dict()['c2'].items() )

print (x)

NANDHA KUMAR
  • 465
  • 4
  • 11