0

Suppose I have a file A with (id,x,y) and another file B with (ID, xmin, xmax,ymin,ymax), with dim(A)~50000 and dim(B)~3000.

What I need is to add an additional column to A where each row is a vector composed of all the B$ID[j] for which A$x[i] is between B$xmin[j] and B$xmax[j] and, simultaneously, A$y[i] is between B$ymin[j] and B$ymax[j]. This vector will have a min dimension of 1 and a max dimension of 4.

(essentially I have a grid and I want to know in which cells of the grid the elements of A are falling. They will always fall in at least one cell to a maximum of 4)

How can I express it ?

Thanks for your help

MKR
  • 19,739
  • 4
  • 23
  • 33
  • 2
    Add a little bit of sample data (in a copy/pasteable format!) and we'll help you out. Just 2 or 3 rows each from A and B should be enough ( make sure it illustrates your problem, have at least one of the A rows fall in multiple Bs). – Gregor Thomas Apr 03 '18 at 20:13
  • FILE A id x y 1 260.4550924434 89.8983020755 2 197.9838097272 89.8112793203 3 207.6767638271 89.7078607093 – user7721228 Apr 03 '18 at 21:28
  • Hi Gregor, here few lines of A and B, all the A line fall at least in the first line of B (making a better working example takes more time--working on it) FILE A id x y 17104 249.0186836277 43.446412271 17242 247.9431897463 43.1708448005 17244 247.5192501032 43.1687854768 17270 248.8112825761 43.1179951788 17274 245.7833601702 43.1128365782 FILE B ID xmin xmax ymin ymax 247132 245.27472527472528 249.23076923076923 -43.5 -40.5 247069 245.30973451327435 248.49557522123894 19.5 22.5 247111 245.30973451327435 248.49557522123894 -22.5 -19.5 – user7721228 Apr 03 '18 at 21:40

2 Answers2

0

Not very proud of this but it works:

   A=data.table(id=c(1,1,1,1,1,2,2,2,2,2,2),x=c(1:5,2:7),y=c((3:7),(4:9)))
   B=data.table(ID=c(1,2),xmin=c(1,2),xmax=c(5,7),ymin=c(3,4), ymax=c(7,9))


    A$newcol <- apply(A,1,function(rowA) B$ID[apply(B,1,function(rowB)  rowA[2]>=rowB[2] & rowA[2]<=rowB[3] & rowA[2]>=rowB[4] & rowA[2]<=rowB[5])])

I will work on finding the data.table / dplyr alternative which will be, I hope, nicer and more generic

Frostic
  • 680
  • 4
  • 11
  • Hi Max Ft, I tried your solution but I get a lot of warnings :longer object length is not a multiple of shorter object length. In addition the result is still a NULL when I try to look at A$newcol. – user7721228 Apr 04 '18 at 09:53
  • I did not name the `ID` of the table B as you did... I corrected it. – Frostic Apr 04 '18 at 14:16
0

Here you go. I could not test this with your data, however so there there might be an error.

getIDs <- function (x, y) { 
  found <- c()
  for ( j in nrows(B) ) {
    if ( x >= B[j,"xmin"] && x <= B[j,"xmax"] &&
         y >= B[j,"ymin"] && y <= B[j,"ymax"] ) {
      found <- append(found, B[j, "ID"])
    }
  }
  return(found)
}
A$NewCol <- apply( A[, c("x", "y")], 1, function(x) getIDs(x[1], x[2]) )

I suggest you check this out here: Call apply-like function on each row of dataframe with multiple arguments from each row

ekatko1
  • 163
  • 8
  • Unrelated but: I never realized a data frame column could contain any type object including list , data.frame and so on as long as the elements of the column are all of the same type. Thanks! – Frostic Apr 03 '18 at 21:29
  • I am not sure to follow. I must add also a loop to go through the A rows, right? – user7721228 Apr 03 '18 at 21:42
  • The apply() function does that loop implicitly. It applies the specified function to each row in the matrix (specified by the "1" argument. "2" would mean for every column). The matrix is the subset of your A dataframe with columns x and y. – ekatko1 Apr 03 '18 at 21:58
  • I tried to follow your suggestions. I do not get error, but if I then I want to see what is inside A$NewCol I get NULL. – user7721228 Apr 04 '18 at 07:50
  • OK, I think that now I understood what you are doing, except the last part of apply: getIDs(x[1], x[2]) I imagine that it should be getIDs(x[1] ,y[2] ) , but it is still don't grasp the meaning of [1] and [2]. – user7721228 Apr 04 '18 at 07:59
  • Hi, I modified a bit your code and now it seems working: 'code' getIDs <- function(x, y) { found <- c() for ( j in 1:nrow(B) ) { if (x >= B[j,"xmin"] && x <= B[j,"xmax"] && y >= B[j,"ymin"] && y <= B[j,"ymax"] ) { found <- append(found, B[j, "ID"]) } } return(found) } A$NewCol <- apply( A[, c("x", "y")], 1, function(x, y) getIDs(x[1], x[2])) I still do not grasp the meaning of x[1], x[2] – user7721228 Apr 04 '18 at 11:18
  • Ah, I see the curly brace was in the wrong place! you could re-write that function as: function(row) getIDs(row[1], row[2]) as I realize now that using x for two purposes is confusing. The apply function loops over rows of the matrix formed by a subset of your dataframe featuring only the columns x and y. Each row gets passed as an argument to the function specified within the call (function(row) getIDs(row[1], row[2])). So row[1] is the first value in the row (i.e. x) and row[2] is the second (i.e. y). Makes sense now? – ekatko1 Apr 04 '18 at 17:18