2

This question is exactly as the following request, with one more twist,

So, I want to set, or conditionally set pandas dataframe column values. The added complexity is, instead of addressing the dataframe columns with string constant (df['data1']), I need to address them with variables (df[var_for_data1]), becaus my df column names are constructed.

Here is the much simplified example to explain what I want:

df = pd.DataFrame({'data1': np.random.randn(100),'data2': np.random.randn(100)})
print(df.head())

Col = 'data1'
print(df[Col].head())
df.data1 = df.data1 +.1
print(df[Col].head())
# so far so good, now how to do above with variable dataframe column name `Col`
#df.Col = df.Col + .1

The question is in the code, so far so good, now how to do above with variable dataframe column name Col.

The next question is how to add a condition to the above assignment, say to do it if df.data1 >=.25 and df.data1 <= .35:. Of course, expressing it using the variable dataframe column name Col.

Community
  • 1
  • 1
xpt
  • 20,363
  • 37
  • 127
  • 216
  • use subscripting with square brackets `df[data1]` this will work even with var string names – EdChum Jul 06 '16 at 15:46
  • if `Col` is a string then yes it will work, try it, think of dfs as a collection of dicts so in effect you're looking up a column that matches that name – EdChum Jul 06 '16 at 15:48
  • Thanks! Hmm... it didn't work for me before. Hold on... trying to find out what's was wrong... – xpt Jul 06 '16 at 15:49
  • wrap the conditions in parentheses `df[(df[col] >=.25) & (df[col] <= .35)]` note that you need to use bitwise `&` here – EdChum Jul 06 '16 at 16:08
  • Thanks. That works, `df[(df[Col] >=.25) & (df[Col] <= .35)] = df[(df[Col] >=.25) & (df[Col] <= .35)]+.1`. Any concerns you are not answering, but just commenting? – xpt Jul 06 '16 at 16:40
  • I don't like to answer unless I'm confident it's fully correct I'll post an answer in a bjt – EdChum Jul 06 '16 at 16:53
  • @EdChum oh, in that case, PLEASE, go ahead. I'd like to give you the full credit for your helps. Thanks. My real script is still having the problem, but that's an entirely different story, I'll try to figure out the problem and ask in another thread if necessary. – xpt Jul 06 '16 at 17:03

1 Answers1

1

You can use square brackets to access a column name using the string rather than as an attribute, I also strongly recommend that you ditch this habit of accessing columns by attribute as this can lead to confusing behaviour such as if you have a column name sum and you do df.sum will return the address of the method sum rather than the column 'sum'.

So df[Col] = df[Col] + 1

will work so long as the column name exists.

Regarding your 2nd question, to compare an array against a scalar value use the bitwise operators &, | and ~ for and, or and not respectively these will return an array of boolean values, to use more than 1 condition you need to wrap the conditions in parentheses due to operator precedence as & has higher precedence than the comparison operators.

So:

df[(df[col] >=.25) & (df[col] <= .35)]

should work, this will mask the df to only the rows where both conditions are met

EdChum
  • 376,765
  • 198
  • 813
  • 562