I would like to do a pairwise comparison of every row by group for the difference in a date.time variable.
I have a data frame composed of a "site" variable, a dummy "species" variable, and a POSIXcT "date.time" variable, in ascending order. Each row has a different species, as I'm interested in the time difference between different species visiting a site. 3cols, 50rows added here
structure(list(site = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c("act_0041",
"act_0048", "ACT0009", "ACT0035", "ACT0041"), class = "factor"),
species = c(12, 14, 28, 6, 34, 29, 27, 22, 35, 9, 16, 2,
32, 33, 6, 2, 29, 10, 34, 9, 22, 28, 32, 23, 33, 6, 10, 27,
12, 34, 32, 31, 10, 30, 6, 14, 35, 8, 23, 32, 12, 34, 22,
1, 13, 18, 6, 34, 27, 11), date.time = structure(c(1531454862,
1535035906, 1535348634, 1536254587, 1537580136, 1539047529,
1539335947, 1542708373, 1545597646, 1548570870, 1548862522,
1548970932, 1548970934, 1530624228, 1536088381, 1536270537,
1538374649, 1538705865, 1543254377, 1544755701, 1545263758,
1546425304, 1546490305, 1530393638, 1531013434, 1532049165,
1537459670, 1545803958, 1546142278, 1560118590, 1560203862,
1530431347, 1531031939, 1533129189, 1533975327, 1534157098,
1535229634, 1535594837, 1536352632, 1536355007, 1536397768,
1536707407, 1537231673, 1562873882, 1531454862, 1531595641,
1536254587, 1537732697, 1538760001, 1540317399), class = c("POSIXct",
"POSIXt"), tzone = "")), row.names = c(NA, -50L), class = c("tbl_df",
"tbl", "data.frame"))
I want to compare all time differences between rows representing different species, within a group, and create a df or matrix where I have a group (e.g., "site"), each pairwise species combination, and the difference in time between them. E.g., for site a, I have the time difference between species a and species b, species b and c, species a and c etc. If I can avoid duplicating interactions (e.g. site a, species b and c, site a, species c and b) that would be great.
structure(list(site = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "act_0041", class = "factor"),
species_interaction = structure(c(1L, 2L, 7L, 12L, 10L, 8L,
6L, 5L, 11L, 13L, 3L, 4L, 9L, 14L, 15L, 17L, 16L), .Label = c("12,12",
"12,14", "12,16", "12,2", "12,22", "12,27", "12,28", "12,29",
"12,32", "12,34", "12,35", "12,6", "12,9", "28,14", "28,28",
"28,34", "28,6"), class = "factor"), time.diff..mins. = c(NA,
350L, 6502L, 150L, 2065L, 52L, 630L, 542L, 2584L, 241L, 340L,
3689L, 201L, 31L, NA, 28L, 356L)), class = "data.frame", row.names =c(NA,-17L))
I've been trying with combn and apply, but I am only getting a matrix of empty integer and values 1,2,3 etc. I know I'm missing something to do with the date.time aspect and maybe displaying df.
df %>% group_by(site)
comb <- t(combn(nrow(as.data.frame(df$species)),2))
dx <- apply(comb, 1, function(x) df[x[1], -1] - df[x[2], -1])
dt <- cbind(comb, dx)
I have been trying to apply this example, though I think I'm missing a simple piece. Any help would be great