33

I have a Dataframe that consists of 2 columns:

  1. "Time Spent on website"
  2. "Dollars spent on the website"

I want to perform some classification analysis on this dataset and I only care whether a user made a purchase or not. So I want to run through the "Dollars spent on the website" column and transform the value to "1" if the user spent over $0.00 and have the value be "0" if the user spent nothing.

What is the proper way to do this with a pandas dataframe?

anc1revv
  • 411
  • 1
  • 4
  • 11

2 Answers2

77
df['purchase'] = 0
df.loc[df['dollars_spent'] > 0, 'purchase'] = 1

or

df['purchase'] = df['dollars_spent'].apply(lambda x: 1 if x > 0 else 0)
SO44
  • 1,249
  • 9
  • 9
  • 7
    While this code may answer the question, providing additional context regarding why and/or how this code answers the question improves its long-term value. Code-only answers are discouraged. – Ajean Aug 10 '16 at 15:55
2

You can also use NumPy's where:

import numpy as np
df['Purchase'] = np.where(df['Dollars spent on the website'] > 0, 1, 0)

If the condition is True, 1 is returned else 0.

rachwa
  • 1,805
  • 1
  • 14
  • 17
  • This is pretty clean. Wondering if there's a way to pass a `list` to the `where` clause to be able to write the filter dynamically. – lowercase00 Jul 26 '22 at 14:58
  • Hey, thanks for your feedback! I don't know if I understand you correctly. If you have multiple if-conditions, try `np.where(df['x'] > 0 & df['y'] < 10, 1, 0)`. However, if you have multiple if-else-statements take a look at *NumPy's* [`select`](https://numpy.org/doc/stable/reference/generated/numpy.select.html). An example can be found [here](https://stackoverflow.com/a/73011194/18145256). Let me know if you have any questions. – rachwa Jul 26 '22 at 20:30