Here is a pure Python solution that uniqueifies each string, joins the sets, then counts the results (Using Divakar's example list)
>>> li=['er', 'IS' , 'you', 'Is', 'is', 'er', 'IS']
>>> Counter(e for sl in map(list, map(set, li)) for e in sl)
Counter({'I': 3, 'e': 2, 's': 2, 'S': 2, 'r': 2, 'o': 1, 'i': 1, 'u': 1, 'y': 1})
If you want upper and lower case to be counted as the same letter:
>>> Counter(e for sl in map(list, map(set, [s.lower() for s in li])) for e in sl)
Counter({'i': 4, 's': 4, 'e': 2, 'r': 2, 'o': 1, 'u': 1, 'y': 1})
Now let's time that:
from __future__ import print_function
from collections import Counter
import numpy as np
import pandas as pd
def dawg(li):
return Counter(e for sl in map(list, map(set, li)) for e in sl)
def nump(a):
chars = np.asarray(a).view('S1')
valid_chars = chars[chars!='']
unqchars, count = np.unique(valid_chars, return_counts=1)
return pd.DataFrame({'char':unqchars, 'count':count})
if __name__=='__main__':
import timeit
li=['er', 'IS' , 'you', 'Is', 'is', 'er', 'IS']
for f in (dawg, nump):
print(" ",f.__name__, timeit.timeit("f(li)", setup="from __main__ import f, li", number=100) )
Results:
dawg 0.00134205818176
nump 0.0347728729248
The Python solution is significantly faster in this case