0

I have data with coordinates, without missing values. I would like to define them as coordinates using sp, but for a subset of the data. When I use

subset_of_data <- data[data$variable == x, ]
coordinates_from_data = subset(subset_of_data, select=c("S_X", "S_Y"))
coordinates(coordinates_from_data) <- c("S_X", "S_Y")

I get:

Error in `coordinates<-`(`*tmp*`, value = c("S_X", "S_Y")) : 
coordinates are not allowed to contain missing values

But when I use subset, there is no problem:

subset_of_data <- subset(data, data$variable == x)
coordinates_from_data = subset(subset_of_data, select=c("S_X", "S_Y"))
coordinates(coordinates_from_data) <- c("S_X", "S_Y")

I don't get the error.

Any idea why it is so?

Antonin
  • 1,748
  • 7
  • 19
  • 24
  • What is tmp? Did you use any regex in your data prep? – Gray Jul 23 '20 at 19:06
  • *tmp* is used internally by R. I don't know very well the details, but see https://stackoverflow.com/questions/28770882/documentation-for-tmp-in-r and https://cran.r-project.org/doc/manuals/r-release/R-lang.html You can see what coordinates<- is doing here: https://stackoverflow.com/a/32586069/762435 And no, I haven't used regex in my data prep, but lots of other steps. What do you have in mind? – Antonin Jul 24 '20 at 06:49
  • 1
    Hi! It's not easy to answer your question without the data but please not that `The real subset function (subset.data.frame()) removes missing values in the condition`. Are you sure that there is no NA in the coordinates? Maybe in the rows where the condition is equal to NA – agila Jul 24 '20 at 09:54
  • Thank you for your comment! Sorry that I cannot share the data. Yes, you're right: by condition on other columns that are NA, I "add" empty/NA values in the coordinates columns! There is no NA in the coordinate columns, but there are NAs in the other, conditioning columns, as described e.g. in r-bloggers.com/subsetting-in-the-presence-of-nas So the solution is subset_of_data <- data[data$variable == x & !is.na(data$variable), ]. Thanks for your help! Do you want to write the answer? If not, I'll do it. – Antonin Jul 24 '20 at 14:06
  • Hi! I think the solution is best solution is that you write the answer with (maybe) a small example of the data (like 2 rows) showing the problem. – agila Jul 24 '20 at 15:13

1 Answers1

0

It has nothing to do with sp; it is just how subsetting works in R. Let's take an example:

df <- data.frame(city = c("Paris", "Berlin", NA),
                 x_coordinate = c(48.8589507, 52.5069312, 50.8550625), 
                 y_coordinate = c(2.27702, 13.1445501, 4.3053501))
df
    city x_coordinate y_coordinate
1  Paris     48.85895      2.27702
2 Berlin     52.50693     13.14455
3   <NA>     50.85506      4.30535

If we turn this dataframe into coordinates, it works, since there is no NA:

coordinates(df) <- c("x_coordinate", "y_coordinate")

Let's imagine now that we want to transform in coordinates only a subset of df, e.g., only Paris. If we do:

sub_df = df[df$city == "Paris", ]

We get:

    city x_coordinate y_coordinate
1  Paris     48.85895      2.27702
NA  <NA>           NA           NA

In this case, transforming into coordinates doesn't work anymore, since the subsetting variable contains NA values and subsetting creates NA values in the coordinates variables.

coordinates(sub_df) <- c("x_coordinate", "y_coordinate")
Error in `coordinates<-`(`*tmp*`, value = c("X_coordinate", "Y_coordinate" : 
  coordinates are not allowed to contain missing values

The way subset works is different:

sub_df_2 = subset(df, df$city == "Paris")
sub_df_2
          coordinates  city
1 (48.85895, 2.27702) Paris

Another option is to be more specific when using [:

sub_df_3 = df[df$city == "Paris" & !is.na(df$city), ]
sub_df_3
          coordinates  city
1 (48.85895, 2.27702) Paris

For Python users

It's quite different from Pandas' [ operator:

import pandas as pd
import numpy as np

df = pd.DataFrame({'city': ['Paris', 'Berlin', np.NaN],
                   'x_coordinate': [48.8589507, 52.5069312, 50.8550625],
                   'y_coordinate': [2.27702, 13.1445501, 4.3053501]})

print(df[df["city"] == 'Paris'])

    city  x_coordinate  y_coordinate
0  Paris     48.858951       2.27702
Antonin
  • 1,748
  • 7
  • 19
  • 24