Let's say I have a large dataframe with a column for 'soundfile' and then 'start and 'end' columns for when a particular bird is vocalising. Each vocalisation can vary significantly in length. An example of the dataframe is sound_df below. Each row in the sound_df represents one vocalisation - each sound file is the same length (300 seconds in the real data, 15 seconds in this example).
## setup example dataframe
id <- c("soundfile_1","soundfile_2","soundfile_3")
sound_df<-data.frame(rep(id, each = 2), c("0","8.0","3.3","11.7","4.6","13.1"), c("3.2","14.1","3.8","12.8","5.9","14.8"))
names(sound_df)[1] <- "soundfile"
names(sound_df)[2] <- "sound_start"
names(sound_df)[3] <- "sound_end"
print(sound_df)
soundfile sound_start sound_end
1 soundfile_1 0 3.2
2 soundfile_1 8.0 14.1
3 soundfile_2 3.3 3.8
4 soundfile_2 11.7 12.8
5 soundfile_3 4.6 5.9
6 soundfile_3 13.1 14.8
I then want to create a new dataframe for which each soundfile is divided into 3 second chunks as below, and the presence or absence of vocalisation in this period is extracted from sound_df and recorded in the column 'present'.
The results produced for sound_df should be as follows:
soundfile start end present
1 soundfile_1 0 3 yes
2 soundfile_1 3 6 yes
3 soundfile_1 6 9 yes
4 soundfile_1 9 12 yes
5 soundfile_1 12 15 yes
6 soundfile_2 0 3 no
7 soundfile_2 3 6 yes
8 soundfile_2 6 9 no
9 soundfile_2 9 12 yes
10 soundfile_2 12 15 yes
11 soundfile_3 0 3 no
12 soundfile_3 3 6 yes
13 soundfile_3 6 9 no
14 soundfile_3 9 12 no
15 soundfile_3 12 15 yes