Finding the mode of the n smallest rows by count in a CSV

Question

I have a csv file that contains a table shown by the example below:

| TimeStampHour    | Day | Hour | Count |
|------------------|-----|------|-------|
| 28/01/2019 00:00 | Mon | 0    | 198   |
| 28/01/2019 01:00 | Mon | 1    | 513   |
| 28/01/2019 02:00 | Mon | 2    | 110   |
| 28/01/2019 03:00 | Mon | 3    | 50    |
| 28/01/2019 04:00 | Mon | 4    | 6     |
| 28/01/2019 05:00 | Mon | 5    | 6     |
| 28/01/2019 06:00 | Mon | 6    | 1503  |
| 28/01/2019 07:00 | Mon | 7    | 108   |
| 28/01/2019 08:00 | Mon | 8    | 211   |
| 28/01/2019 09:00 | Mon | 9    | 169   |
...

There are further values that go on for several months and contain the other days of the week all the way till Sunday.

What I am trying to achieve:

I want to first find the 25 smallest rows by the count and then output those rows. I have done the following to achieve this in python:

import pandas as pd
import numpy as np

df = pd.read_csv('mydata.csv')

least25 = df.nsmallest(25, "Count",keep='all')
print(least25)

OUTPUT:

TimeStampHour  Day  Hour  Count
124   02/02/2019 07:00  Sat     7      3
142   03/02/2019 04:00  Sun     4      3
143   03/02/2019 06:00  Sun     6      3
144   03/02/2019 07:00  Sun     7      3
165   04/02/2019 04:00  Mon     4      3
...                ...  ...   ...    ...
1940  20/04/2019 04:00  Sat     4      6
1965  21/04/2019 05:00  Sun     5      6
2131  28/04/2019 04:00  Sun     4      6
3138  09/06/2019 03:00  Sun     3      6
4144  21/07/2019 03:00  Sun     3      6

This I am happy with.

My next step is to perform a mode calculation on these results so that I can output the rows where the modes exist in the same output format as above (showing rows and colums).

My desired output would look like this:

| Day | Hour | Count(Min) | Occurance |
|-----|------|------------|-----------|
| Wed | 6    | 3          |     10    |
| Wed | 7    | 3          |      8    |
| Wed | 8    | 3          |      7    |
| Wed | 9    | 3          |      9    |
| Wed | 10   | 3          |     11    |
| Wed | 11   | 3          |      7    |
| Wed | 12   | 3          |      5    |
| Fri | 7    | 3          |      1    |
| Fri | 8    | 3          |      6    |
| Fri | 9    | 3          |      7    |

In other words: Print the rows with the most common miniumum (the modal minimum) the most frequent occurring minimum count. In order to try and achieve this is did the following:

modeOfmin = least25['Count'].mode()
print(modeOfMin) 

OUTPUT:
0    6
dtype: int64

I would like to learn how to achieve my desired output but I do not know how to go about this.

Would someone be able to explain this?

jezrael · Accepted Answer · 2019-12-03T12:09:11.683

3

Use boolean indexing with convert one element Series to scalar by Series.iat:

modeOfmin = least25['Count'].mode()
df = least25[least25['Count'] == modeOfmin.iat[0]]

print (df)
        TimeStampHour  Day  Hour  Count
124  02/02/2019 07:00  Sat     7      3
142  03/02/2019 04:00  Sun     4      3
143  03/02/2019 06:00  Sun     6      3
144  03/02/2019 07:00  Sun     7      3
165  04/02/2019 04:00  Mon     4      3

edited Dec 03 '19 at 12:09

answered Dec 03 '19 at 12:04

jezrael

822,522
95
1,334
1,252

Great, thanks for this solution! Would you also mind explaining how you would print the row which showed the day and hour with the minimum count and the most occurrences. Say count is used as "number of sales". Pseudo: print Day, Hour, occurance WHERE COUNT IS MIN – fore-caster Dec 03 '19 at 12:27
@fore-caster - I think if there is column `Occurance` in solution above again filter by `df1 = df[df['Occurance'] == df['Occurance'].min()]` – jezrael Dec 03 '19 at 12:30
Thanks. Can python add an additional column for occurance and then work out those occurrences for the dataframe? – fore-caster Dec 03 '19 at 12:35
@fore-caster - What is `Occurance` column? How is count? – jezrael Dec 03 '19 at 12:37
So count would be a column like "No Of Sales" and the occurance would be the number of times a row occurred with the same value. Ie: Friday, 6pm 3 sales if this occurred 10 times in a year then there would be 10 rows and so the occurance would be 10. Does that make more sense? – fore-caster Dec 03 '19 at 12:56
@fore-caster - I think you need `df = df.groupby(['Day','Hour','Count']).size().reset_index(name='Occurance')`, check [this](https://stackoverflow.com/a/32801170) for more info – jezrael Dec 03 '19 at 13:30

Neo · Answer 2 · 2019-12-03T13:01:37.963

0

Use print(least25[least25['Count']==least25['Count'].mode().iat[0]])

edited Dec 03 '19 at 13:01

answered Dec 03 '19 at 12:02

Neo

627
3
7

Thanks for suggesting a solution. However, I got the following exception message: ValueError: Can only compare identically-labeled Series objects – fore-caster Dec 03 '19 at 12:29
@for you are right, error comes because `.mode()` returns a series and not one numeric value. I edited my answer to work properly. – Neo Dec 03 '19 at 13:03
@jezrael my original (incorrect) answer was posted ~2 minutes before yours. I just added .iat[0] to make it work correctly. So, no copy here. – Neo Dec 03 '19 at 14:11
@jezrael please check the timestamp of my original (incorrect) answer and the edit history to confirm this is no copy. Moreover, your solution (posted after mine) implements boolean indexing, which I used in my original answer. Of course I know you are huge in pandas and I am learning every day from you, I just don't understand why you attack my answer and call it copy. – Neo Dec 03 '19 at 14:15
@NeofytosBoufidis - ok, maybe I wrong commnt it, I try explain better. – jezrael Dec 03 '19 at 14:16
@NeofytosBoufidis - So in this situtation if I get comment something wrong in my answer and there is already posted more like 5 minutes another answer with correct solution, I have 2 possibilities - remove answer or create answer which is different like already posted. So I first check if exist some another answers (which cannot use, because duplicate them) and then try to find some another working solution. I think this is corect way, if think I am wrong let me know – jezrael Dec 03 '19 at 14:19
@jezrael I think I see what you mean. I don't know if I need to delete my answer, if there is such rule please offer a link. I believe it is no huge problem because I have already acknowledged the wrong part of my answer in previous comment. Your answer is already accepted and up voted. Have a nice day – Neo Dec 03 '19 at 14:27
@NeofytosBoufidis - I dont force you delete/do something, only explain how I feel it is correct approach. ;) – jezrael Dec 03 '19 at 14:30
Happy coding and nice day from Slovakia ;) – jezrael Dec 03 '19 at 14:30

Finding the mode of the n smallest rows by count in a CSV

2 Answers2