2

Here's what I have in my dataframe-

RecordType    Latitude    Longitude    Name
  L             28.2N        70W       Jon
  L             34.3N        56W       Dan
  L             54.2N        72W       Rachel

Note: The dtype of all the columns is object.

Now, in my final dataframe, I only want to include those rows in which the Latitude and Longitude fall in a certain range (say 24 < Latitude < 30 and 79 < Longitude < 87).

My idea is to apply a function to all the values in the Latitude and Longitude columns to first get float values like 28.2, etc. and then to compare the values to see if they fall into my range. So I wrote the following-

def numbers(value):
    return float(value[:-1])

result[u'Latitude'] = result[u'Latitude'].apply(numbers)
result[u'Longitude'] = result[u'Longitude'].apply(numbers)

But I get the following warning-

Warning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

I'm having a hard time understanding this since I'm new to Pandas. What's the best way to do this?

cs95
  • 379,657
  • 97
  • 704
  • 746
kev
  • 2,741
  • 5
  • 22
  • 48

2 Answers2

3

If you don't want to modify df, I would suggest getting rid of the apply and vectorising this. One option is using eval.

u = df.assign(Latitude=df['Latitude'].str[:-1].astype(float))
u['Longitude'] = df['Longitude'].str[:-1].astype(float)

df[u.eval("24 < Latitude < 30 and 79 < Longitude < 87")]

You have more options using Series.between:

u = df['Latitude'].str[:-1].astype(float))
v = df['Longitude'].str[:-1].astype(float))

df[u.between(24, 30, inclusive=False) & v.between(79, 87, inclusive=False)]
cs95
  • 379,657
  • 97
  • 704
  • 746
  • I liked the `Series.between` option better, works like a charm. Just wanted to ask one more question: there is also a `date` column where I want to only select the rows above a certain year, say `2000`. Is there a command for that? – kev Jan 19 '19 at 21:03
  • @kev Assuming it is a `datetime` column, you can use `df['date'].dt.year > 2000` to get a boolean condition for that. – cs95 Jan 19 '19 at 21:04
  • Dates are in this format- `20013012`, where is `2001` is the year, `30` the day and `12` the month. – kev Jan 19 '19 at 21:08
  • @kev OK, then try this: `df['date'].astype(str).str[:4] > 2000` – cs95 Jan 19 '19 at 21:08
  • `lat = df[u'Latitude'].str[:-1].astype(float)` ... `long = df[u'Longitude'].str[:-1].astype(float)` ... `date = df[u'Date'].str[:4].astype(int)` ... `result = df[date>2000 & lat.between(24, 30) & long.between(79, 87)]` ... For some reason, this isn't working. – kev Jan 19 '19 at 21:23
  • @kev Overloaded bitwise operators have higher precedence, so you need extra parentheses: `df[(date>2000) & lat.between(24, 30) & long.between(79, 87)]` – cs95 Jan 19 '19 at 21:28
2

As for why Pandas threw that particular A value is trying to be set on a copy of a slice... warning and how to avoid it:

First, using this syntax should prevent the error message:

result.loc[:,'Latitude'] = result['Latitude'].apply(numbers)

Pandas gave you the warning because your .apply() function may be attempting to modify a temporary copy of Latitude/Longitude columns in your dataframe. Meaning, the column is copied to a new location in memory before the operation is performed on it. The article you referenced (http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy) gives examples of why this could potentially cause unexpected problems in certain situations.

Pandas instead recommends that you instead use syntax that will ensure you are modifying a view of your dataframe's column with the .apply() operation. Doing this will ensure that your dataframe ends up being modified in the manner you expect. The code I wrote above using .loc will tell Pandas to access and modify the contents of that column in-place in memory, and this will keep Pandas from throwing the warning that you saw.

James Dellinger
  • 1,281
  • 8
  • 9
  • I've written a post about this warning [here](https://stackoverflow.com/a/53954986/4909087). – cs95 Jan 19 '19 at 22:01