4

I am a beginner in programming in R. I am at the moment trying to retrieve some site names from a dataframe containing the X and Y coordinates and site names and copy them into a different dataframe with specific points.

    FD <- matrix(data =c(rep(1, 500), rep(0, 500),
                     rnorm(1000, mean = 550000, sd=4000),
                     rnorm(1000, mean = 6350000, sd=20000), rep(NA, 1000)),
             ncol = 4, nrow = 1000, byrow = FALSE)
colnames(FD) <- c('Survival', 'X', 'Y', 'Site') 
FD <- as.data.frame(FD)

shpxt <- matrix(c(526654.7,526810.5 ,6309098,6309187,530405.4,530692,
                  6337699, 6338056,580432.7, 580541.9, 6380246,6380391,
                  585761.3, 585847.6, 6379665, 6379759, 584192.1, 584279.4,
                  6382358, 6382710, 583421.2, 583492.4, 6379356, 6379425,
                  532395.5, 532515.3 , 6336421, 6336587, 534694.6, 534791.2,
                  6335620, 6335740, 536749.8, 536957.5, 6337584, 6338130, 590049.6,
                  590419.4, 6372232, 6372432, 580443, 580756.5, 6386342, 6386473,
                  575263.9, 575413.7, 6380416, 6380530, 584625.1, 584753.9, 6381009,
                  6381335), ncol = 4, nrow = 13, byrow = TRUE)
sites <- c("Brandbaeltet", "Brusaa", "Granly", "Jerup Strand", "Knasborgvej",
           "Milrimvej", "Overklitten", "Oversigtsareal", "Sandmosen",
           "Strandby", "Troldkaer", "Vaagholt", "Videsletengen")
colnames(shpxt) <- c("Xmin", "Xmax", "Ymin", "Ymax")
shpxt <- as.data.frame(shpxt)
shpxt["Sites"] <- sites

My approach is using a nested for loop like this:

    tester <- function(FD, shpxt)
{ for (i in 1:nrow(FD)) for (j in 1:nrow(shpxt))         # Open Function
{ if (FD[i,2] >= shpxt[j,1] | FD[i,2] <= shpxt[j,2] &    # Open Loop
      FD[i,3] >= shpxt[j,3] | FD[i,3] <= shpxt[j,4])
{                                                        # Open Consequent
  FD[i,4]=shpxt[j,5]
  {break}
} else                                                  # Close Consequent
{FD[i,4] <- NA                                          # Open alternative
}                                                      # Close alternative
}                                                      # Close loop
}                                                      # Close function

tester(FD, shpxt)

In essence I want to search for which site the X and Y coordinates in FD fall into range and copy the sitename into FD$Site in row i. When I run the loop on my real data I get the following error message:

test(FD, shpxt)
Error in if (FD[i, 2] >= shpxt[j, 1] | FD[i, 2] <= shpxt[j, 2] & FD[i,  : 
  missing value where TRUE/FALSE needed

How do I get the loop to go from here to where the loop will be copying the desired sitename into my FD?

Kind Regards Thøger

rafa.pereira
  • 13,251
  • 6
  • 71
  • 109
Thoegernh
  • 85
  • 1
  • 5

2 Answers2

8

You want to merge two data frames considering a range match between key columns. Here are two solutions.

using sqldf

library(sqldf)

output <- sqldf("select * from FD left join shpxt 
                on (FD.X >= shpxt.Xmin and FD.X <= shpxt.Xmax and
                    FD.Y >= shpxt.Ymin and FD.Y <= shpxt.Ymax ) ")

using data.table

library(data.table)

# convert your datasets in data.table
  setDT(FD) 
  setDT(shpxt)

output <- FD[shpxt, on = .(X >= Xmin , X <= Xmax,                # indicate x range
                           Y >= Ymin , Y <= Ymax), nomatch = NA, # indicate y range
             .(Survival, X, Y, Xmin, Xmax, Ymin, Ymax, Sites )]  # indicate columns in the output

There are different alternatives to solve this problem, as you will find it in other SO questions here and here.

ps. Keep in mind that for loop is not necessarily the best solution.

Community
  • 1
  • 1
rafa.pereira
  • 13,251
  • 6
  • 71
  • 109
  • This worked brilliantly, thank you very much! I'll try to be more exact when asking questions from now on. – Thoegernh May 11 '16 at 10:59
0

Here's a failed attempt in base R -- perhaps someone can help correct

 getSite <- function(x, y) {
    return (shpxt[x >= shpxt['Xmin'] & x <= shpxt['Xmax'] &
                  y >= shpxt['Ymin'] & y <= shpxt['Ymax'] , "Sites"])
  }

test it

   p <- c(Survival=0, X=shpxt[2,1], Y=shpxt[2,3]) 
   getSite(p[['X']],p[['Y']])

returns correctly with

[1] "Brusaa"

However

FD$Site<-apply(FD, 1, function(point) {getSite(point[['X']], point[['Y']])})

fails with

Error in ``$<-.data.frame(tmp`, "Site", value = character(0)) : replacement has 0 rows, data has 1000

Andrew Lavers
  • 4,328
  • 1
  • 12
  • 19