Suppose I have a DataFrame, in which one of the columns (we'll call it 'power') holds integer values from 1 to 10000. I would like to produce a numpy array which has, for each row, a value indicating whether the corresponding row of the DataFrame has a value in the 'power' column which is greater than 9000.
I could do something like this:
def categorize(frame):
return np.array(frame['power']>9000)
This will give me a boolean array which can be tested against with True and False. However, suppose I want the contents of the array to be 1 and -1, rather than True and False. How can I accomplish this without having to iterate through each row in the frame?
For background, the application is preparing data for binary classification via machine learning with scikit-learn.