I am trying to work out how I can generate new columns based on the first and last instance of a column value. My data looks like this:
DF <- structure(list(CHR = c(1, 1, 1, 1, 1, 1),
SNP = c("rs2494631", "rs4648637", "rs2494627", "rs11122119", "rs1844583","rs2292242"),
BP = c(2399149, 2401364, 2402499, 6768856, 8383469, 8385059),
KBdist= c(NA, 2215, 1135, 4366357, 1614613, 1590),
locus = c(1, 1, 1, 2, 3, 3)),
.Names = c("CHR","SNP","BP","KBdist","locus"),
row.names = c(NA, 6L),
class = "data.frame")
> df
CHR SNP BP KBdist locus
1 rs2494631 2399149 NA 1
1 rs4648637 2401364 2215 1
1 rs2494627 2402499 1135 1
1 rs11122119 6768856 4366357 2
1 rs1844583 8383469 1614613 3
1 rs2292242 8385059 1590 3
and what I am trying to achieve is: "If locus is the same, make start the same as BP in the first instance of that locus, and make stop the same as BP in the last instance of that locus". Which would yield an output that looks like this:
CHR SNP BP KBdist locus start stop
1 rs2494631 2399149 NA 1 2399149 2402499
1 rs4648637 2401364 2215 1 2399149 2402499
1 rs2494627 2402499 1135 1 2399149 2402499
1 rs11122119 6768856 4366357 2 6768856 6768856
1 rs1844583 8383469 1614613 3 8383469 8385059
1 rs2292242 8385059 1590 3 8383469 8385059
I have been playing around with the answer to a similar question I posed: Combining an ifelse statement with shift data.table function in R
and with the shift function for data.table in R, but to no avail. Any help would be greatly appreciated!
Thanks.