How to calculate the mode for a field in a CSV file?

Question

I have this text file:

Category;currency;sellerRating;Duration;endDay;ClosePrice;OpenPrice;Competitive?
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Automotive;US;3115;7;Tue;0,01;0,01;No
Automotive;US;3115;7;Tue;0,01;0,01;No
Automotive;US;3115;7;Tue;0,01;0,01;Yes

I want to calculate the median from each category. So for example I want to calculate the mode from sellerRating. I have this so far (because I also needed to calculate the averages but I managed to do that):

import csv
import locale
import statistics
from pprint import pprint, pformat

import locale

locale.setlocale(locale.LC_ALL, 'Dutch_Netherlands.1252')

avg_names = 'sellerRating', 'Duration', 'ClosePrice', 'OpenPrice'
averages = {avg_name: 0 for avg_name in avg_names}


num_values = 0
with open('bijlage.txt', newline='') as bestand:
     csvreader = csv.DictReader(bestand, delimiter=';')
     for row in csvreader:
        num_values += 1
        for avg_name in avg_names:
             averages[avg_name] += locale.atof(row[avg_name])


for avg_name, total in averages.items():
    averages[avg_name] = total / num_values

print('raw results:')
pprint(averages)

print()
print('Averages:')
for avg_name in avg_names:
    rounded = locale.format_string('%.2f', round(averages[avg_name], 2),
                           grouping=True)
    print('  {:<13} {:>10}'.format(avg_name, rounded))

I tried to do:

from statistics import mode
mode(averages)

But that does not work and I am stuck now. I am a python beginner so if you anwser my problem could you explain me why that should be the anwser so I can learn.

"but that does not work" - what happens? Does the import fail - if so, are you using Python 3.4 or later? Do you get a syntax error? Or the wrong result? There are some other ideas on [this old question](https://stackoverflow.com/q/10797819) too, or you could even write code to process the list and find the mode yourself? — Rup, Jan 03 '19 at 10:58

score 1 · Answer 1 · answered Jan 03 '19 at 11:07

Pandas is quite a nice library for this.
pip install pandas

import pandas as pd
df = pd.read_csv('bijlage.csv', delimiter=';', decimal=',')  # 'bijlage.txt' in your case
sellerRating_median = df['sellerRating'].median()
print('Seller rating median: {}'.format(sellerRating_median)

Besides median(), there is also mean() to calculate the average
You can also use mode() to calculate the mode of the sequence, but this returns a list of numbers, so you'll have to use mode()[0] to get the first one.

martineau · Answer 2 · 2019-01-07T19:07:11.150

You could do it like this while you're computing the averages, which uses a defaultdict to store the data for the computing the mode of each of the categories. It's useful here because it allows creation of dictionary-of-lists without know what the keys are going to be or how many of them it advance, plus it will automatically initialize the value of each one the first time it is accessed to an empty list the way it's being used here (defaultdict(list)).

You ought to be using the statistics module to compute the averages, too, rather than computing it yourself—but I didn't change that since it's not the topic of your question.

import locale
import csv
from collections import defaultdict
import locale
import statistics

locale.setlocale(locale.LC_ALL, 'Dutch_Netherlands.1252')

avg_names = 'sellerRating', 'Duration', 'ClosePrice', 'OpenPrice'
averages = {avg_name: 0 for avg_name in avg_names}

seller_ratings = defaultdict(list)
durations = defaultdict(list)

num_values = 0
with open('bijlage.txt', newline='') as bestand:
     csvreader = csv.DictReader(bestand, delimiter=';')
     for row in csvreader:
        num_values += 1
        for avg_name in avg_names:
             averages[avg_name] += locale.atof(row[avg_name])

        # Add row values to corresponding category for fields of interest.
        seller_ratings[row['Category']].append(locale.atof(row['sellerRating']))
        durations[row['Category']].append(locale.atof(row['Duration']))

# Compute average of each field of interest.
for avg_name, total in averages.items():
    averages[avg_name] = total / num_values

print('Averages:')
for avg_name in avg_names:
    rounded = locale.format_string('%.2f', round(averages[avg_name], 2), grouping=True)
    print('  {:<13} {:>10}'.format(avg_name, rounded))


# Calculate modes for seller ratings.
seller_rating_modes = {}
for category, values in seller_ratings.items():
    try:
        seller_rating_modes[category] = statistics.mode(values)
    except statistics.StatisticsError:
        seller_rating_modes[category] = None  # No unique mode.

print()
print('Seller Rating Modes:')
for category, mode in seller_rating_modes.items():
    if mode is None:
        print('  {:<16} {:>10}'.format(category, 'No unique mode'))
    else:
        rounded = locale.format_string('%.2f', round(mode, 2), grouping=True)
        print('  {:<16} {:>10}'.format(category, rounded))


# Calculate modes for duration.
duration_modes = {}
for category, values in durations.items():
    try:
        duration_modes[category] = statistics.mode(values)
    except statistics.StatisticsError:
        duration_modes[category] = None  # No unique mode.

print()
print('Duration Modes:')
for category, mode in duration_modes.items():
    if mode is None:
        print('  {:<16} {:>10}'.format(category, 'No unique mode'))
    else:
        rounded = locale.format_string('%.2f', round(mode, 2), grouping=True)
        print('  {:<16} {:>10}'.format(category, rounded))

Traceback (most recent call last): File "C:\Users\Gebruiker\Documents\Python Shell\B1A08\PI4\PI 4 Deel 2 CSV.py", line 96, in modes[category] = statistics.mode(values) File "C:\Users\Gebruiker\AppData\Local\Programs\Python\Python37-32\lib\statistics.py", line 506, in mode 'no unique mode; found %d equally common values' % len(table) statistics.StatisticsError: no unique mode; found 3 equally common values Why do I get this error and how do I solve it? — , Jan 03 '19 at 13:26
Josse: You're getting it because there's `no unique mode; found 3 equally common values` as the error message indicates. One way to "solve" it is use a `try`/`except` to "catch" that error and set the mode value to something that be checked for later in the `for` loop that prints out all the mode values at the end. You could also just skip the `category` by replacing the `modes[category] = None` with `pass`. — martineau, Jan 03 '19 at 17:09
Is it possible to get the modes from, for example: Duration or OpenPrice? If yes, how to do this? — , Jan 07 '19 at 11:21

How to calculate the mode for a field in a CSV file?

2 Answers2

Linked