0

I have a data frame and need another column in it based on some values of another column. Getting syntax error, what's the right method?

data['Severity']=[data.Attack duration>25]== "High"
data['Severity']=[data.Attack duration<=25 and data.Attack duration>25 ]== "Medium"

Severity is a new column defined and it needs to be filtered with values in Attack duration.

Attack duration is int type and severity needs to be assigned with str values (High, Medium etc.)

I'm getting this error:

File "", line 1
data['Severity']=[data.Attack duration>25]== "High"
^
SyntaxError: invalid syntax

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459

1 Answers1

0

You're getting a syntax error because of how you are referencing the column, to reference columns with spaces in their name you must reference the column like a dict key, i.e.

df['Attack duration'] 

Your solution:

data['Severity']=[data.Attack duration>25]== "High"

Problem 1: You are incorrectly accessing the column. When using . indexing you cannot have spaces or Python perceives that as the end of your statement and the beginning of another one.

Problem 2: You are trying to set the value to "high" with a boolean operator, "==" is used to check equality! Moreover that boolean operation means nothing in this context.

Something to watch out for: Chained indexing, I wont explain this because pandas does a great job. Please refer to this link to understand this better and save yourself from future headaches. https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-view-versus-copy

I would also recommend you read through the entire slicing and indexing guide for pandas and try out those operations.

Answer: Same as @AshishMJ but broken down a bit

indexes = df['Attack duration'] > 25  # indexes where attack duration > 25
df.loc[indexes, 'Severity'] = "High" #set the 'Severity' column for the passed indices as high