Aggregation for unique values in Python

Question

I have a csv file. I want to make an aggregation operation on this. My aim is printing the daily number of unique measurements in descending order based on dates.

.csv file looks like this:

['endDate,weight\n',
 ' 2020-06-12  00:00:00+03:00 , 91.5,91.9,91.9,91.9,92.55,92.55,92.55,92.55,92.1,92.1,93.3,93.3 \n',
 ' 2020-06-13  00:00:00+03:00 , 91.6,91.6,92.85,92.85,92.85,92.85,92.3,92.3,92.1,92.1,94.1,94.1 \n',
 ' 2020-06-14  00:00:00+03:00 , 91.5,91.5,91.65,91.65,91.5,91.5,92.9,92.9 \n',
 ' 2020-06-15  00:00:00+03:00 , 91.85,91.85,91.6,91.6,91.85,91.85,92.55,92.55,92.4,92.4,93.7,93.7,93.35,93.35 \n',
 ' 2020-06-16  00:00:00+03:00 , 91.6,91.6,91.3,91.3,92.75,92.75,92.15,92.15,93.15,93.15,92.9,92.9 \n',
 ' 2020-06-17  00:00:00+03:00 , 91.05,91.05,91.85,91.85,92.4,92.4,92.4,92.4,94.0,94.0,93.7,93.7,93.05,93.05,93.05,93.05 \n',
 ' 2020-06-18  00:00:00+03:00 , 91.55,91.55,91.45,91.45,91.25,91.25,91.65,92.2,91.95 \n',
 ' 2020-06-19  00:00:00+03:00 , 91.3,91.6,92.45,92.05,91.8,93.1,92.7,93.5,93.15 \n',
 ' 2020-06-20  00:00:00+03:00 , 90.8,90.8,90.6,90.6,90.6,90.6,92.15,92.15,92.05,92.05,91.4,91.4 \n',
 ' 2020-06-21  00:00:00+03:00 ,\n']

Expecting result is:

enter image description here

import re
import collections

with open("weights.csv") as myFile:
    formattedData = dict()
    for line in myFile:
        try:
            date , numbers = line.split(' , ')
            numbers = numbers.replace("\n","") 
            numbers = numbers.split(',')
            formattedData[date] = len(list(set(numbers)))
        except:
            date = line
            formattedData[date]=0

formattedData

After splitting data, my data looks like this:

{'endDate,weight\n': 0,
 ' 2020-06-12  00:00:00+03:00 , 91.5,91.9,91.9,91.9,92.55,92.55,92.55,92.55,92.1,92.1,93.3,93.3 \n': 0,
 ' 2020-06-13  00:00:00+03:00 , 91.6,91.6,92.85,92.85,92.85,92.85,92.3,92.3,92.1,92.1,94.1,94.1 \n': 0,
 ' 2020-06-14  00:00:00+03:00 , 91.5,91.5,91.65,91.65,91.5,91.5,92.9,92.9 \n': 0,
 ' 2020-06-15  00:00:00+03:00 , 91.85,91.85,91.6,91.6,91.85,91.85,92.55,92.55,92.4,92.4,93.7,93.7,93.35,93.35 \n': 0,
 ' 2020-06-16  00:00:00+03:00 , 91.6,91.6,91.3,91.3,92.75,92.75,92.15,92.15,93.15,93.15,92.9,92.9 \n': 0,
 ' 2020-06-17  00:00:00+03:00 , 91.05,91.05,91.85,91.85,92.4,92.4,92.4,92.4,94.0,94.0,93.7,93.7,93.05,93.05,93.05,93.05 \n': 0,
 ' 2020-06-18  00:00:00+03:00 , 91.55,91.55,91.45,91.45,91.25,91.25,91.65,92.2,91.95 \n': 0,
 ' 2020-06-19  00:00:00+03:00 , 91.3,91.6,92.45,92.05,91.8,93.1,92.7,93.5,93.15 \n': 0,
 ' 2020-06-20  00:00:00+03:00 , 90.8,90.8,90.6,90.6,90.6,90.6,92.15,92.15,92.05,92.05,91.4,91.4 \n': 0,
 ' 2020-06-21  00:00:00+03:00 ,\n': 0}

c=Counter(formattedData)

I cannot reproduce this with your source list of strings as input. [PythonFiddle](http://pythonfiddle.com/). Do you have the same as a result after reading the file? — astentx, Dec 14 '20 at 14:04

Shiva · Accepted Answer · 2020-12-13T15:46:14.817

0

You can use defaultdict.
Replace formattedData = dict() with

formattedData = defaultdict(int)

Replace formattedData[date] = len(list(set(numbers))) with

formattedData[date] += len(set(numbers))

Finally, create a new dictionary with keys sorted in descending order of count:

descending = {dt: count for dt, count in sorted(formattedData.items(), key=lambda item: item[1], reverse=True)}
print(descending)

edited Dec 13 '20 at 15:46

answered Dec 13 '20 at 12:02

Shiva

2,627
21
33

Thank you, actually I want to count each unique value for each date. – user14815110 Dec 13 '20 at 12:25
Doesn't the above code produce the result you mentioned in "Expecting result" image? – Shiva Dec 13 '20 at 15:46

Aggregation for unique values in Python

1 Answers1