Adding new columns to r dataframe based on multiple columns within the dataframe

Question

I have data of tourism interactions with individually identified whales, where I have the whale ID, date of encounter and time of encounter

Id    Date     Time  
A   20110527    10:42
A   20110527    11:24
A   20110527    11:52
A   20110603    10:29
A   20110603    10:59
B   20110503    11:23
B   20110503    11:45
B   20110503    12:05
B   20110503    12:17

I would now like to add to additional columns that label the day of each encounter for each individual and the number of encounters within that day as follows:

Id     Date     Time  Day   Encounter
A   20110527    10:42   1   1
A   20110527    11:24   1   2
A   20110527    11:52   1   3
A   20110603    10:29   2   1
A   20110603    10:59   2   2
B   20110503    11:23   1   1
B   20110503    11:45   1   2
B   20110503    12:05   1   3
B   20110503    12:17   1   4

Is this possible? Any help would be greatly appreciated!

score 2 · Answer 1 · edited Mar 30 '16 at 08:04

We could use data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by "Id", we match the 'Date' with unique values of 'Date' to create the 'Day' column. Then, we group by 'Id', 'Date' and assign (:=) the sequence of rows to "Encounter".

library(data.table)
setDT(df1)[, Day:= match(Date, unique(Date)), by = Id
         ][, Encounter := seq_len(.N), by = .(Id, Date)]
df1
#    Id     Date  Time Day Encounter
#1:  A 20110527 10:42   1         1
#2:  A 20110527 11:24   1         2
#3:  A 20110527 11:52   1         3
#4:  A 20110603 10:29   2         1
#5:  A 20110603 10:59   2         2
#6:  B 20110503 11:23   1         1
#7:  B 20110503 11:45   1         2
#8:  B 20110503 12:05   1         3
#9:  B 20110503 12:17   1         4

data

df1 <- structure(list(Id = c("A", "A", "A", "A", "A", 
 "B", "B", "B", 
"B"), Date = c(20110527L, 20110527L, 20110527L, 
 20110603L, 20110603L, 
 20110503L, 20110503L, 20110503L, 20110503L), 
 Time = c("10:42", 
 "11:24", "11:52", "10:29", "10:59", "11:23", "11:45", "12:05", 
 "12:17")), .Names = c("Id", "Date", "Time"),
  class = "data.frame", row.names = c(NA, -9L))

score 1 · Answer 2 · edited May 23 '17 at 11:50

here is a reproducible example:

df <- structure(list(
  Id = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L),
                 .Label = c("A", "B"), class = "factor"),
  Date = c(20110527L, 20110527L, 20110527L, 20110603L,
           20110603L, 20110503L, 20110503L, 
           20110503L, 20110503L),
  Time = structure(c(2L, 5L, 7L, 1L, 3L, 4L, 6L, 8L, 9L),
                   .Label = c("10:29", "10:42", "10:59", "11:23", "11:24", "11:45", "11:52", "12:05", "12:17"), class = "factor")),
  .Names = c("Id",  "Date", "Time"), class = "data.frame", row.names = c(NA, -9L))

then one can use dplyr and

library(dplyr)
group_by(df, Id, Date) %>% mutate(Encounter=1:n()) %>% ungroup()

Source: local data frame [9 x 4]

Id     Date   Time Encounter
(fctr)    (int) (fctr)     (int)
1      A 20110527  10:42         1
2      A 20110527  11:24         2
3      A 20110527  11:52         3
4      A 20110603  10:29         1
5      A 20110603  10:59         2
6      B 20110503  11:23         1
7      B 20110503  11:45         2
8      B 20110503  12:05         3
9      B 20110503  12:17         4

score 1 · Answer 3 · answered Mar 30 '16 at 09:10

Or Base R using ave and by:

I used the data posted by Vincent Bonhomme (Data should be sorted by Date and Id):

# Function to count the days per individual using factor levels 
foo <- function(x){as.numeric(as.character(factor(x,labels = 1:nlevels(factor(x)))))}

# Add the columns Day & Encounter
df$Day <-unlist(by(df$Date,list(df$Id),FUN=foo))
df$Encounter <- ave(1:nrow(df),list(df$Id,df$Date),FUN=seq_along)

Adding new columns to r dataframe based on multiple columns within the dataframe

3 Answers3

data