5

I have a dataframe of 11 columns and I want to create a new 0,1 column based on values in two of those columns.

I have already tried using np.where to create other columns but it doesnt work for this one.

train["location"] = np.where(3750901.5068 <= train["x"] <= 3770901.5068 
and -19268905.6133 <= train['y'] <= -19208905.6133, 1, 0)

I get this error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Luca Bezerra
  • 1,160
  • 1
  • 12
  • 23
principe
  • 173
  • 1
  • 1
  • 11

2 Answers2

8

You can use pandas.DataFrame.isin which will be a better solution. Also yes you need parenthesis and & instead of "and" . Documentation for pandas.DataFrame.isin https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isin.html

For example:

df=pd.DataFrame({'a':[100,110,120,111,109],'b':[120,345,124,119,127]})
df['c']=np.where((df['a'].isin([100,111])) & (df['b'].isin([120,128])),1,0)

In your case it would be:

train["location"]=np.where(((train["x"].isin([3750901.5068,3770901.5069])) & (train["y"].isin([-19268905.6133,-19268905.6132])),1,0)
Kartikeya Sharma
  • 1,335
  • 1
  • 10
  • 22
4

I'm not sure you even need np.where here. To element-wise and two series, use & here instead of and. See: Logical operators for boolean indexing in Pandas

Also, 3750901.5068 <= train["x"] <= 3770901.5068 seems to be internally translated by python into (3750901.5068 <= train["x"]) and (train["x"] <= 3770901.5068), which again, has and and won't work. So you'll need to either explicitly split each one up into e.g. (3750901.5068 <= train["x"]) & (train["x"] <= 3770901.5068) or use Series.between e.g. train["x"].between(3750901.5068, 3770901.5068, inclusive=True). See: How to select rows in a DataFrame between two values, in Python Pandas?

You'll also need parentheses for the two arguments to &.

So the end result should look like

train["location"] = train["x"].between(3750901.5068, 3770901.5068, inclusive=True) & train['y'].between(-19268905.6133, -19208905.6133, inclusive=True)

This will give you a series of bools (Trues and Falses). These are already just 0s and 1s under-the-hood. If you really want 0s and 1s, you can pick a solution from here. For example, train.location = train.location.astype(int)

Kevin Wang
  • 2,673
  • 2
  • 10
  • 18
  • 1
    Since you've linked to that question, I also recommend going through [my answer](https://stackoverflow.com/a/54358361/4909087) to that question. – cs95 Apr 03 '19 at 21:14
  • wow, that's a thorough answer. Reading it made me realize that my answer here is incorrect, since `foo < train['y'] < bar` will be translated into `foo < train['y'] and train['y'] < bar`. – Kevin Wang Apr 03 '19 at 21:24
  • 1
    You can use [`Series.between`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.between.html) instead. – cs95 Apr 03 '19 at 21:28