find and select the most frequent data of column in pandas DataFrame

Question

I have a dataframe with the following column:

file['DirViento']

Fecha
2011-01-01    ENE
2011-01-02    ENE
2011-01-03    ENE
2011-01-04    NNE 
2011-01-05    ENE
2011-01-06    ENE
2011-01-07    ENE
2011-01-08    ENE
2011-01-09    NNE
2011-01-10    ENE
2011-01-11    ENE
2011-01-12    ENE
2011-01-13    ESE
2011-01-14    ENE
2011-01-15    ENE
... 
2011-12-17    ENE
2011-12-18    ENE
2011-12-19    ENE
2011-12-20    ENE
2011-12-21    ENE
2011-12-22    ENE
2011-12-23    ENE
2011-12-24    ENE
2011-12-25    ENE
2011-12-26    ESE
2011-12-27    ENE
2011-12-28     NE
2011-12-29    ENE
2011-12-30    NNE
2011-12-31    ENE
Name: DirViento, Length: 290, dtype: object

The column has daily records of wind direction for each month of the year. I'm trying to get the dominant direction for each month. To accomplish this, select the data most often repeated during the month:

file['DirViento'].groupby(lambda x: x.month).value_counts()


1   ENE    23
    NNE     6
    E       1
    ESE     1
2   ENE    21
    NNO     3
    NNE     2
    NE      1
3   ENE    21
    OSO     1
    ESE     1
    SSE     1
4   ENE    21
    NNE     2
    ESE     1
    NNO     1
6   ENE    15
    ESE     2
    SSE     2
    ONO     1
    E       1
7   ENE    22
    ONO     1
    OSO     1
    NE      1
    NNE     1
    NNO     1
8   ENE    23
    NNE     5
    NE      1
    ONO     1
    ESE     1
9   ENE    17
    NNE     7
    ONO     2
    NE      1
    E       1
    ESE     1
    NNO     1
10  ENE    16
    NNE     2
    ESE     2
    NNO     2
    ONO     1
    NE      1
    E       1
11  ENE    13
    NNE     2
    ESE     2
    ONO     1
12  ENE    26
    NNE     3
    NE      1
    ESE     1
Length: 54, dtype: int64

When running the following line of code

wind_moda=file['DirViento'].groupby(lambda x: x.month).agg(lambda x: stats.mode(x)[0][0])

Should get something like this

     1  ENE    
     2  ENE    
     3  ENE  
     4  ENE
     6  ENE
     7  ENE    
     8  ENE    
     9  ENE
    10  ENE  
    11  ENE
    12  ENE

But I get the following:

 1          E  
 2        ENE  
 3        ENE  
 4        ENE  
 6          E  
 7        ENE  
 8        ENE  
 9          E  
 10         E  
 11       ENE  
 12       ENE

Why in 4 of the 12 months is not taking into account the most frequent data?

Am I doing something wrong ?

Any idea to get the most common data each month?

What do you want to happen if more than one direction has the same (maximal) count in a given month? — DSM, Jan 13 '14 at 03:47

score 12 · Answer 1 · edited Jun 06 '16 at 19:05

12

Pandas 0.15.2 has a DataFrame.mode() method. It might be of use to someone looking for this as I was.

Here are the docs.

Edit: For the Value:

DataFrame.mode()[0]

edited Jun 06 '16 at 19:05

Merlin

24,552
41
131
206

answered Mar 17 '15 at 14:12

mvbentes

1,022
12
24

score 6 · Answer 2 · edited May 23 '17 at 12:26

This is not as straightforward as it could be (should be).

As you probably know, the statistics jargon for the most common value is the "mode." Numpy does not have a built-in function for this, but scipy does. Import it like so:

from scipy.stats.mstats import mode

It does more than simply return the most common value, as you can read about in the docs, so it's convenient to define a function that uses mode to just get the most common value.

f = lambda x: mode(x, axis=None)[0]

And now, instead of value_counts(), use apply(f). Here is an example:

In [20]: DataFrame([1,1,2,2,2,3], index=[1,1,1,2,2,2]).groupby(level=0).apply(f)
Out[20]: 
1    1.0
2    2.0
dtype: object

Update: Scipy's mode does not work with strings. For your string data, you'll need to define a more general mode function. This answer should do the trick.

score 4 · Answer 3 · edited Sep 27 '18 at 08:26

4

For whole dataframe, you can use:
```
dataframe.mode()
```
For specific column:
```
dataframe.mode()['Column'][0]
```

Second case is more useful in imputing the values.

edited Sep 27 '18 at 08:26

Pang

9,564
146
81
122

answered Sep 27 '18 at 08:08

Hrushikesh

121
1
1
3

find and select the most frequent data of column in pandas DataFrame

3 Answers3

Linked