How to count a number of occurrences in Data Frame?

Question

Need help. I have Pandas DataFrame like:

Shown ID                                       Bought ID
59,60,61,62,60,63,64,65,66,61,67,68,67         67,60,63
63,64,63,64,63,65,66                           0
87,63,84,63,86                                 86

I need to find the number of occurrences of each number of each "Show ID" row in whole "Show ID" column.

So the expected result for "Shown ID" column is:

    [[('59', 1), ('60', 2), ('61', 2), ('62', 1), ('63', 6),
      ('64', 3), ('65', 2), ('66', 2), ('67', 2), ('68', 1)],
     [('63', 6), ('64', 3), ('65', 2), ('66', 2)],
     [('87', 1), ('63', 6), ('84', 1), ('86', 1)]]

How to do that?

Then I need to create a list of lists with sorted values of each row of "Shown ID" column (each list of result list of lists above).

So summary result must be:

[['63', '64', '60', '61', '65', '66', '67', '68', '59', '62'],
 ['63', '64', '65', '66'],
 ['63', '87', '84', '86']]

How Can I do that? If the numbers have the same frequency of occurrences, it needs to sort in ascending appearing in list (the earlier appeared in row, the more priority)

is my answer what you are looking for? – Vikash Singh Jan 04 '17 at 13:17 — Vikash Singh, Jan 04 '17 at 13:17

Vikash Singh · Accepted Answer · 2017-01-04T14:27:57.373

2

This is how you can get what you are looking for:

import pandas as pd
from collections import Counter


data = [{'c_id' : [59,60,61,62,60,63,64,65,66,61,67,68,67]},
{'c_id' : [63,64,63,64,63,65,66]},
{'c_id' : [87,63,84,63,86]}]

df = pd.DataFrame.from_dict(data)

df['c_id'].apply(lambda val: [key for key,val in Counter(val).most_common()])

output:

0    [67, 60, 61, 64, 65, 66, 68, 59, 62, 63]
1                            [63, 64, 65, 66]
2                            [63, 84, 86, 87]

Values which have the same count might come in any order.

If you want to make column level counter then you can do it like this:

all_cids = []
for index, row in df.iterrows():
    all_cids.extend(row['c_id'])

import operator
counter_obj = Counter(all_cids)

def get_ordered_values(values):
    new_values = []
    covered_valeus = set()
    for val in values:
        if val in covered_valeus:
            continue
        covered_valeus.add(val)
        new_values.append((val, counter_obj[val]))    
    new_values.sort(key=operator.itemgetter(1), reverse=True)
    return [key for key, val in new_values]

df['c_id'].apply(lambda values: get_ordered_values(values))

output

0    [63, 64, 60, 61, 65, 66, 67, 59, 62, 68]
1                            [63, 64, 65, 66]
2                            [63, 84, 86, 87]

edited Jan 04 '17 at 14:27

answered Jan 04 '17 at 13:11

Vikash Singh

13,213
8
40
70

Thanks, but correct result is: ['63', '64', '60', '61', '65', '66', '67', '68', '59', '62'], ['63', '64', '65', '66'], ['63', '87', '84', '86'] It's a different order in your version – Alex Savin Jan 04 '17 at 13:27
@AlexSavin do you want it sorted in reverse?? Please check your data first. first row has 63 count as 1. Do you want to keep a Global count on the entire column and sort by that ?? – Vikash Singh Jan 04 '17 at 13:28
You right, In first row 63 count as 1, but in all rows of "Shown ID" column 63 occurred 6 times. So this number the most frequently occured in column. – Alex Savin Jan 04 '17 at 13:34
@AlexSavin So you need a global count across the column. We can do that. Give me sometime. – Vikash Singh Jan 04 '17 at 13:34
@AlexSavin added another example for column level counter. Please check and let me know if this is what you want. Thanks – Vikash Singh Jan 04 '17 at 14:05
Your output is [63, 64, 65, 66, 67, 60, 61, 68, 59, 62], but correct output must be ['63', '64', '60', '61', '65', '66', '67', '68', '59', '62'] At first, elements sorted in descending order of occurrences. If some elements have the same number of occurances they shouldn't be sorted and they should follows each other with original order. – Alex Savin Jan 04 '17 at 14:14
@AlexSavin have made the change as you were expecting. please check. – Vikash Singh Jan 04 '17 at 14:28
Correct output!) But there is some problem. Your code works correct. But my data some different. I import data from .txt file in dataframe `data = pd.read_csv('file.txt', sep = ';', names = ['Shown_ID', 'Bought_ID'])`. The separator divides data to two columns - Shown_ID and Bought_ID. In file every row looks like `59,60,61,62,60,63,64,65,66,61,67,68,67;67,60,63` And the rows isn't numbers, but it is strings. So type of data['Shown'] is 'pandas.core.series.Series'. I don't know how convert it to integers. – Alex Savin Jan 04 '17 at 14:58
This should solve that problem http://stackoverflow.com/a/23112008/3027854 – Vikash Singh Jan 04 '17 at 15:10

score 1 · Answer 2 · answered Jan 04 '17 at 13:25

If i understand it completely , you want to find number of occurrences but not list of indexes where specific data is found. I can imagine several ways of doing this:

way:, count the data.

If your data type is not the multidimensional list, then you can simple use count function in list object.

# in python3 you would need list(range(3)) etc to test this example
someList = range(3)+range(2)+range(1)

sortedElements = sorted(set(someList)) #> looses duplicates of elements, somelist must be hash-able

for x in sortedElements:
    # list.count(somelist,element) is usable for python2.7 and python3.5
    # tested myself on py interpreter, i can not say for IronPython and/or Rhino enviorment
    print( x, someList.count(x) ) # and there you will have element, and number of occurrences

Returning indexes of duplications:

#somelist same as before
#sortedElements same as before
for x in sortedElements:
      lIndexes = [ someList.index(elem) for elem in sortedElements if elem == x] 
      print(lIndexes)

Multidimensional list:

As i see it, you must first dump the whole list into 1 list or , do steps 1 or 2 on each child list of multidimensional list depending on your need.
Of course there is several way to transverse multidimensional list, you can map or filter or reduce or pass them as *arguments etc ( there are too many ways to transverse multi list for me to count, you can find most of the methods on this website ) but the method of your choosing is very tightly connected to your example. Without further explanation i would not dare to consult you since it could do more damage and good.

Hope this helps.

How to count a number of occurrences in Data Frame?

2 Answers2