1

Here is my question.
I have one dataframe df which contain two columns named date and wd.
And the wd means the wind direction which range from (0~360).
So, the df represent the wind direction of somewhere in certain time frame.

I want to classify those wind direction into 16 classes like this:
http://7xrn7f.com1.z0.glb.clouddn.com/16-3-8/30080798.jpg

The ranges are presented here.

http://7xrn7f.com1.z0.glb.clouddn.com/16-3-8/8398960.jpg

This is what I can deal with now:

wd_stat = []
for i in range(0,len(df),1):
    wd = df.wd.iloc[i]
    ### NNE 11.25-33.75
    if 11.25 <= wd < 33.75:
       wd_stat.append("NNE")    
    ### NE 33.75-56.25   
    if (33.75 <=wd < 56.25):
       wd_stat.append("NE")
    ### ENE 56.25 - 78.75    
    if (56.25 <=wd < 78.75):
       wd_stat.append("ENE") 
    if (78.75 <=wd < 101.25):
       wd_stat.append("E") 
    if (101.25 <=wd < 123.75):
        wd_stat.append("ESE") 
      .....not done yet......

My method was inflexible and dump.
Can anyone give some advices to deal the classify problem like this(number range into certain characters) in high efficience.

Han Zhengzu
  • 3,694
  • 7
  • 44
  • 94

2 Answers2

8

A nice way to do these kind of things is by using numpy.digitize(). It takes an array of bins and values and returns the index into which bin each value falls. Use these indices in a matching string array to get what you want:

import numpy as np
import pandas as pd

df = pd.DataFrame({"wd": pd.Series([20.1,50,8.4,359,243,123])})

directions = np.array('N NNE NE ENE E ESE SE SSE S SSW SW WSW W WNW NW NNW N'.split())
bins = np.arange(11.25, 372, 22.5)
df['wd_stat'] = directions[np.digitize(df['wd'], bins)]
print df

      wd wd_stat
0   20.1     NNE
1   50.0      NE
2    8.4       N
3  359.0       N
4  243.0     WSW
5  123.0     ESE
Rob
  • 3,418
  • 1
  • 19
  • 27
  • 1
    it's indeed very elegant solution! – MaxU - stand with Ukraine Mar 08 '16 at 11:59
  • Another question follows here. In your method, the [0~11.25] range is replaced by [360~371.25]. So I was wondering add 360 to those columns before digitize it. Using `df.iloc[df.loc[ 0<= df['wd'] < 11.25].index]["wd"]+360` can't achieve that. How to add constant to specific rows based on certain classess? – Han Zhengzu Mar 08 '16 at 14:08
  • 1
    If you look closely, you see that i have defined the 'N' direction twice in the directions array. This corresponds to the two bins [-np.inf, 11.25] and [348.75, 371.25]. Come to think of it, the top limit is even unnecessary, and you might as well define bins = np.arange(11.25, 360, 22.5). Bottom line is: there is no need to alter the values before digitizing. – Rob Mar 08 '16 at 14:57
  • Thanks! When I directly use `np.ditigize`, it can't work because there are some NAN value in practical data. So I set another direction 'NAN'. After ditigize, I use `df = df.replace("NAN", np.nan)` solving it. – Han Zhengzu Mar 09 '16 at 01:11
2

You can use loc:

import pandas as pd

df = pd.DataFrame({"wd": pd.Series([20.1,50,8.4 ])})
print df
     wd
0  20.1
1  50.0
2   8.4

print (df.wd >= 11.25 ) & (df.wd < 33.75 )
0     True
1    False
2    False
Name: wd, dtype: bool

df.loc[(df.wd >= 11.25 ) & (df.wd < 33.75 ), 'new'] = 'NNE'
df.loc[(df.wd >= 33.75 ) & (df.wd < 56.25 ), 'new'] = 'NE'
print df
     wd  new
0  20.1  NNE
1  50.0   NE
2   8.4  NaN
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252