-1

Still looking for an answer, help is very much appreciated!

I am working on a dataset containing different types of data about cars I want to add a value to the horsepower_score column, if the value of the horsepower column is greater than x but less than y.

I want to add the value:

1 if horsepower is less than or equal to 50

2 if horsepower is greater than 50 but less than or equal to 100

3 if horsepower is greater than 100 but less than or equal to 200

4 if horsepower is greater than 200 but less than or equal to 250

5 if horsepower is greater than 250 but less than or equal to 300

I have tried several different codes, but I can't make any of them work.

import pandas as pd
data=[[150,0],[275,0],[30,0],[90,0],[220,0]]
df=pd.DataFrame(data,columns=['horsepower','horsepower_score'])

if df['horsepower']<=50:
    df['horsepower_score']=1

if df['horsepower']>50 & <=100:
    df['horsepower_score']=2

if df['horsepower']>100 & <=200:
    df['horsepower_score']=2

if df['horsepower']>200 & <=250:
    df['horsepower_score']=4

if df['horsepower']>250 & <=300:
    df['horsepower_score']=5

So my desired result would look like this:

horsepower horsepower_score
150 3
275 5
30 1
90 2
220 4

Update: Im still looking for a solution. The only answer that worked partially is the one from @Andreas, but that suggestion keeps on adding the number to the column when the code is run several times, and i simply want to assign the number once.

lindstroem
  • 11
  • 3
  • 1
    What exactly is your question? – mkrieger1 May 19 '21 at 20:56
  • Did you mean: `if 100 < df['horsepower'] < 200:`? – quamrana May 19 '21 at 20:56
  • How do i add a value to a new column when the value in some column is greater than 100 and less than 200? – lindstroem May 19 '21 at 20:57
  • @quamrana, i have tried your suggestion, but get the following error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). – lindstroem May 19 '21 at 20:59
  • Can you post a working example including initialization of a sample dataframe? – tdelaney May 19 '21 at 21:02
  • 1
    There is `between`: `df['horsepower'].between(100, 200, inclusive=False)`, but that wont solve the warning you're going to get for the truth value of a Series being ambiguous... – ALollz May 19 '21 at 21:07
  • @ALollz, very nice! That, should be the answer. – Andreas May 19 '21 at 21:08
  • @ALollz My goal is to create a second column called "horsepower_score" that ranges from 1-5. So if a car has number of horsepower less than 200 and larger than 100, i want to add "3" to the column "horsepower_score". Can i adjust your suggested code so that this will happen? – lindstroem May 19 '21 at 21:16
  • 3
    There are multiple ways to solve this problem but burden is on OP to post a working example so that we can demonstrate that the answer works. – tdelaney May 19 '21 at 21:17
  • Something like `df["horsepower_score"] = df["horsepower"] // 100 + 1` could do the trick, depending on exactly where you want the score demarcations to be. – tdelaney May 19 '21 at 21:21
  • @tdelaney your above suggestion solves the issues to a limited degree, but thanks. I have added what i assume is meant by a working example, maybe that will make it easier to answer :D – lindstroem May 19 '21 at 21:35
  • Welcome to Stack Overflow! Please include your __expected__ output for the __provided__ data. See [MRE - Minimal, Reproducible, Example](https://stackoverflow.com/help/minimal-reproducible-example), and [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/15497888). – Henry Ecker May 19 '21 at 22:06
  • Okay. What about scores 1, 2, and 4? – Henry Ecker May 19 '21 at 22:16
  • Yes i actually want those too, just thought it was easier to explain and understand if 3 and 5 was given as examples. – lindstroem May 19 '21 at 22:19
  • Actually, this problem is probably easier solved all together. – Henry Ecker May 19 '21 at 22:21
  • Okay thanks, i have now added the entire problem. – lindstroem May 19 '21 at 22:40
  • Okay Last question you have score 1 strictly less than 50 and score 2 strictly greater than 50 which means 50 belongs to no category is this your intention ? This is also true for 100, 200, and 250 – Henry Ecker May 19 '21 at 22:43

3 Answers3

2

In python you can chain these conditions, which makes pretty nice to read.

if 100 < df['horsepower'] < 200:
    df['horsepower_score']=3

Or you use the more traditional way:

if df['horsepower'] > 100 & df['horsepower'] < 200:
    df['horsepower_score']=3
schilli
  • 1,700
  • 1
  • 9
  • 17
  • 2
    This is a dataframe so `df['hosepower_score'] = 3` assigns 3 to the entire column. I don't think `100 < df['horsepower'] < 200` will even work with a dataframe. – tdelaney May 19 '21 at 21:15
  • Gives the error message: "ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()." – lindstroem May 26 '21 at 15:16
1

You have to break it into two separate comparisons. Those comparisons form a boolean index you can use to slice df.

ix = df['horsepower']>100 & df['horsepower']<200
df.loc[ix, 'horsepower_score'] = 3
James
  • 32,991
  • 4
  • 47
  • 70
1

You wrote

I want to add a value to horsepower_score

Therefore I assume there are already values in horsepower_score, and you want to add 3 to the existing number. If that is the case, you can use this:

df['horsepower_score'] += (df['horsepower'].ge(100) & df['horsepower'].le(200)) * 3
Andreas
  • 8,694
  • 3
  • 14
  • 38