1

Trying to count unique value from the following list without using collection:

('TOILET','TOILETS','AIR CONDITIONING','AIR-CONDITIONINGS','AIR-CONDITIONING')

The output which I require is :

('TOILET':2,'AIR CONDITIONiNGS':3)

My code currently is

for i in Data:
    if i in number:
        number[i] += 1
    else:
        number[i] = 1
print number

Is it possible to get the output?

wwii
  • 23,232
  • 7
  • 37
  • 77
Gaming
  • 211
  • 2
  • 10
  • 1
    Assuming that `number` is a dictionary prior to the loop that should be fine... The output you expect isn't valid syntax... what isn't working/what are you getting instead? (Also - your `tuple` example isn't valid syntax either - and somehow your `i` has become lowercase in the expected results...) – Jon Clements Oct 14 '17 at 15:16
  • By using my current code its result will be ('TOILET' :1,''TOILETS':1,'AIR CONDITIONING':1,'AIR- CONDITIONINGS':1,'AIR-CONDITIONING':1) – Gaming Oct 14 '17 at 15:19
  • 2
    Which is to be expected - TOILET and TOILETS aren't the same string and nor are AIR CONDITIONING AIR-CONDITIONINGS and AIR-CONDITITIONING... Your issue isn't with counting the frequency of the data - you need to standardise your data somehow first... – Jon Clements Oct 14 '17 at 15:21
  • 1
    @Gaming. Then it's not unique elements that you are trying to count. You have to explain in excruciating detail what it means for two items to be the same in that case. – Mad Physicist Oct 14 '17 at 15:23
  • Oh yes standardize the data, are there any way to dealt on big data? – Gaming Oct 14 '17 at 15:27
  • Maybe use string similarity as explored in [this SO Q&A](https://stackoverflow.com/q/17388213/2823755) - you will need to determine *how similar* they must be to be the same. But it might get messy comparing all the combinations. – wwii Oct 14 '17 at 15:49

5 Answers5

0
original = ('TOILETS', 'TOILETS', 'AIR CONDITIONING', 
            'AIR-CONDITIONINGS', 'AIR-CONDITIONING')
a_set = set(original)
result_dict = {element: original.count(element) for element in a_set}

First, making a set from original list (or tuple) gives you all values from it, but without repeating.

Then you create a dictionary with keys from that set and values as occurrences of them in the original list (or tuple), employing the count() method.

MarianD
  • 13,096
  • 12
  • 42
  • 54
0

You can try this:

import re
data = ('TOILETS','TOILETS','AIR CONDITIONING','AIR-CONDITIONINGS','AIR-CONDITIONING')
new_data = [re.sub("\W+", ' ', i) for i in data]
print new_data
final_data = {}
for i in new_data:
   s = [b for b in final_data if i.startswith(b)]
   if s:
      new_data = s[0]
      final_data[new_data] += 1
   else:
      final_data[i] = 1

print final_data

Output:

{'TOILETS': 2, 'AIR CONDITIONING': 3}
Ajax1234
  • 69,937
  • 8
  • 61
  • 102
0

I don't believe the python list has an easy built-in way to do what you are asking. It does, however, have a count method that can tell you how many of a specific element there are in a list. Example:

some_list = ['a', 'a', 'b', 'c']
some_list.count('a')  #=> 2

Usually the way you get what you want is to construct an incrementable hash by taking advantage of the Hash::get(key, default) method:

some_list = ['a', 'a', 'b', 'c']
counts = {}
for el in some_list
    counts[el] = counts.get(el, 0) + 1
counts #=> {'a' : 2, 'b' : 1, 'c' : 1}
Luke
  • 100
  • 9
0
a = ['TOILETS', 'TOILETS', 'AIR CONDITIONING', 'AIR-CONDITIONINGS', 'AIR-CONDITIONING']
b = {}

for i in a:
    b.setdefault(i,0)
    b[i] += 1

You can use this code, but same as Jon Clements`s talk, TOILET and TOILETS aren't the same string, you must ensure them.

Josh Karpel
  • 2,110
  • 2
  • 10
  • 21
Nick.Tao
  • 1
  • 1
0

Using difflib.get_close_matches to help determine uniqueness

import difflib
a = ('TOILET','TOILETS','AIR CONDITIONING','AIR-CONDITIONINGS','AIR-CONDITIONING')
d = {}
for word in a:
    similar = difflib.get_close_matches(word, d.keys(), cutoff = 0.6, n = 1)
    #print(similar)
    if similar:
        d[similar[0]] += 1
    else:
        d[word] = 1

The actual keys in the dictionary will depend on the order of the words in the list.

difflib.get_close_matches uses difflib.SequenceMatcher to calculate the closeness (ratio) of the word against all possibilities even if the first possibility is close - then sorts by the ratio. This has the advantage of finding the closest key that has a ratio greater than the cutoff. But as the dictionary grows the searches will take longer.

If needed, you might be able to optimize a little by sorting the list first so that similar words appear in sequence and doing something like this (lazy evaluation) - choosing an appropriately large cutoff.

import difflib, collections
z = collections.OrderedDict()
a = sorted(a)
cutoff = 0.6
for word in a:
    for key in z.keys():
        if difflib.SequenceMatcher(None, word, key).ratio() > cutoff:
            z[key] += 1
            break
    else:
        z[word] = 1

Results:

>>> d
{'TOILET': 2, 'AIR CONDITIONING': 3}
>>> z
OrderedDict([('AIR CONDITIONING', 3), ('TOILET', 2)])
>>> 

I imagine there are python packages that do this sort of thing and may be optimized.

wwii
  • 23,232
  • 7
  • 37
  • 77