-1

So I have 2 data sets in R: table 1 and table 2.

These are 2 .CSV files, with the same first column name called Date. However most of the dates are different, but some are the same.

What I would like to do is match the cells in that Date column that are identical in both tables, and then from that row where that cell is contained, i would like to return the five rows before it and the five rows after it in 10 different tables. So each row goes to a different new output table.

Basically I would like to run a regression on each ith row table afterwards, with some other column in that table..

I was thinking using head() and tail() but i couldnt figure out the index part to where to start..

thank you. this is a simple example. when match row 3 in table 1 with row 1 in table 2, how would i go about taking the 3 rows in table 1 below that matched row.

>table 1
>Date      price1  price2  price3  
>11/02/13   4.4        4.22    4.12  
>11/04/13   4.23       4.09    4.10
>12/01/13   4.01       4.27    4.14
>12/02/13   4.1        4.23    4.16
>12/02/13   4.65       4.23    4.17
>11/01/13   4.4        4.22    4.12
>11/07/13   4.23       4.09    4.10
>12/09/13   4.01       4.27    4.14
>12/12/13   4.1        4.23    4.16
>12/15/13   4.65       4.23    4.17

>table 2
>Date      price1  price2  price3
>12/01/13   2.4        2.22    9.12
>08/04/13   4.23       7.09    6.10
>12/01/13   1.01       6.27    6.14
>12/08/13   6.1        3.23    4.16
>12/05/13   4.65       2.23    3.17
>11/02/13   4.4        4.22    4.12
>07/04/13   4.23       4.09    4.10
>09/01/13   4.01       4.27    4.14
>01/02/13   4.1        4.23    4.16
>11/05/13   4.65       4.23    4.17
uniXVanXcel
  • 807
  • 1
  • 10
  • 25

1 Answers1

0

Using the example data you provided:

table1 <- read.table(header=TRUE,text="
Date      price1  price2  price3  
11/02/13   4.4        4.22    4.12  
11/04/13   4.23       4.09    4.10
12/01/13   4.01       4.27    4.14
12/02/13   4.1        4.23    4.16
12/02/13   4.65       4.23    4.17
11/01/13   4.4        4.22    4.12
11/07/13   4.23       4.09    4.10
12/09/13   4.01       4.27    4.14
12/12/13   4.1        4.23    4.16
12/15/13   4.65       4.23    4.17
")

table2 <- read.table(header=TRUE,text="
Date      price1  price2  price3
12/01/13   2.4        2.22    9.12
08/04/13   4.23       7.09    6.10
12/01/13   1.01       6.27    6.14
12/08/13   6.1        3.23    4.16
12/05/13   4.65       2.23    3.17
11/02/13   4.4        4.22    4.12
07/04/13   4.23       4.09    4.10
09/01/13   4.01       4.27    4.14
01/02/13   4.1        4.23    4.16
11/05/13   4.65       4.23    4.17
")

First save the dates that match like this:

mydates <- merge(table1, table2, by="Date")$Date

Now write a helper function to get some specified dates and its neighboring rows from a given table:

getneighbors <- function(table, x, neigh=5){
    r <- -neigh:neigh
    lines <- which(table[,1] %in% x)
    o <- outer(lines, r, `+`)
    a <- apply(o, 2, pmin, nrow(table))
    a <- apply(a, 2, pmax, 1)
    L <- lapply(seq(ncol(a)), function(j)table[a[,j],])
    names(L) <- gsub("-","_",paste0("Neigh",r))
    L
}

This functions creates a list of dataframes, where each dataframe has the neighboring matches of table in x (using the first column of table). The number of neighbors to get is specified with neigh, default 5 (up and below).

Note that neighbors calculated simply by adding or subtracting may lie at invalid row numbers, hence the calls to apply(..., pmin) and apply(..., pmax). We assume that neighbors at rows 0 or negative are in fact the first row, and neighbors at rows beyond the size of the table are to be substituted by the last row.

With that in mind, there will be some duplicates, especially with small sample data:

> getneighbors(table1, mydates)

$Neigh_5
        Date price1 price2 price3
1   11/02/13    4.4   4.22   4.12
1.1 11/02/13    4.4   4.22   4.12

$Neigh_4
        Date price1 price2 price3
1   11/02/13    4.4   4.22   4.12
1.1 11/02/13    4.4   4.22   4.12

$Neigh_3
        Date price1 price2 price3
1   11/02/13    4.4   4.22   4.12
1.1 11/02/13    4.4   4.22   4.12

$Neigh_2
        Date price1 price2 price3
1   11/02/13    4.4   4.22   4.12
1.1 11/02/13    4.4   4.22   4.12

$Neigh_1
      Date price1 price2 price3
1 11/02/13   4.40   4.22   4.12
2 11/04/13   4.23   4.09   4.10

$Neigh0
      Date price1 price2 price3
1 11/02/13   4.40   4.22   4.12
3 12/01/13   4.01   4.27   4.14

$Neigh1
      Date price1 price2 price3
2 11/04/13   4.23   4.09   4.10
4 12/02/13   4.10   4.23   4.16

$Neigh2
      Date price1 price2 price3
3 12/01/13   4.01   4.27   4.14
5 12/02/13   4.65   4.23   4.17

$Neigh3
      Date price1 price2 price3
4 12/02/13    4.1   4.23   4.16
6 11/01/13    4.4   4.22   4.12

$Neigh4
      Date price1 price2 price3
5 12/02/13   4.65   4.23   4.17
7 11/07/13   4.23   4.09   4.10

$Neigh5
      Date price1 price2 price3
6 11/01/13   4.40   4.22   4.12
8 12/09/13   4.01   4.27   4.14

The result is a list, where each element is named NeighX or Neigh_X, indicating neighboring rows X below or X above, respectivelly. There are 11 elements by default, if you want only 10, ignore Neigh0, which stands for the rows that match precisely the ones indicated.

You can save the result in a list object, say, L <- getneighbors(table1, mydates), and access each dataframe using L[["Neigh3"]].

Ferdinand.kraft
  • 12,579
  • 10
  • 47
  • 69
  • well how about returning only the 5th row? or 4th etc..? instead of the 5 above? only one selected one thanks – uniXVanXcel Sep 01 '13 at 21:43
  • @Rmything If you have a different question, please make a new post -- instead of unaccepting a valid answer. – Ferdinand.kraft Sep 01 '13 at 22:14
  • actually it is the question i asked, but he responded only to a part of it. But you re right for the effort he put into it, i shouldn't unaccept it! i just wasn't sure how to get that detail checked out.thanks for your comment. – uniXVanXcel Sep 01 '13 at 22:16
  • @Rmything, sorry I didn't notice you wanted the rows spread out in different tables. I'll edit the answer. One more question: what to do about duplicates? say you matched rows 1 and 3. Neighbors are 0:6 and 0:8? Not all tables will be full... – Ferdinand.kraft Sep 01 '13 at 22:22
  • well, whatever is easier for you. Actually let's say we have 30 matches between the 2 tables, meaning, if 1 row matches exactly with 1 other row in table 2, then this match would create 10 tables right? If we had 1 more than 1 match in the 2nd table then we could either disregard it or not. Thus, in the end if we have 30 matches between the 2 tables, we actually would get 10 tables of 30 rows each.ex:row 5th above, for all 30 matches would get 1 table.(30 rows).same for row 4th.etc.. thank you so much for taking your time on this. I just began R, and it's quite tricky for such a task. thanks! – uniXVanXcel Sep 01 '13 at 22:34
  • hello buddy . still here? :) – uniXVanXcel Sep 01 '13 at 23:22
  • sorry bro, i needed to uncheck the question as I am not sure if you were still around and I kinda need some explanation on how to do this. I am very new to R (5days), so dont get me wrong, i totally appreciate your help. if you could let me know how to make it for separates tables, then i would give you your props. If nobody ever answers it, I will still check the question in your favor. – uniXVanXcel Sep 02 '13 at 00:49
  • @Rmything, edited the question. Feel free to ask for any further explanations, it's a pleasure to help. :-) – Ferdinand.kraft Sep 02 '13 at 01:37
  • amazing man!! great to meet ppl who would work for free like you just did. I hope to get good one day and give back like you re doing..thanks again! all i can give is a check. hope could give more. will get back to you if anything else.thanks.. – uniXVanXcel Sep 02 '13 at 01:48