1

I have a column time_bin that is based on cumulative radiocarbon dates. However I need to fill the gaps in the time_bin sequence. In the example data below this means I need 2700, and 3100 added in. This will be applied to a lot of different data sets with different gaps so needs to be automated. It will have to expand this size of the dataframe, its fine if the values in the other columns are just NA for now as I think I know how to populate them with what I need once they're created.

The time_bin column is created by using mutate along with ceiling as shown below, so maybe it can be changed at this point, rather than later.

I can create the column I need,called seq below, but I'm not sure how to force it into a dataframe.

If there's a way this can be done with a tidyverse aproach rather than vectored as I have done it that would be great too.

So far I have:

data<- structure(list(cumulative.time = c(2458.09948930625, 2580.22242330625, 
                                          2707.31373980624, 2839.71214840625, 2977.77505230625, 3121.87854830625
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))

data <- data%>% mutate(time_bin=ceiling(cumulative.time/100)*100)

max <- max(data$time_bin, na.rm = TRUE)

min <- min(data$time_bin, na.rm = TRUE)

seq <- seq(from = min, to = max, by = 100)

Thanks people!

Paul Tansley
  • 171
  • 7

2 Answers2

1

We can use complete from tidyr to create a sequence between minimum of time_bin value till maximum with a step of 100.

tidyr::complete(data, time_bin = seq(min(time_bin), max(time_bin), by = 100))

# time_bin cumulative.time
#     <dbl>           <dbl>
#1     2500           2458.
#2     2600           2580.
#3     2700             NA 
#4     2800           2707.
#5     2900           2840.
#6     3000           2978.
#7     3100             NA 
#8     3200           3122.
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Hi Ronak, thanks for such a quick reply but I get the following error with your answer: Error: Column name `time_bin` must not be duplicated. – Paul Tansley Sep 25 '20 at 10:44
  • Are you using it on the data that you have shared or some other data? – Ronak Shah Sep 25 '20 at 10:46
  • I've got it now, I was just being an idiot, was trying to run it piped but hadn't taken the second data out so it was runing data%>% complete(data,...) works fine now. Thanks – Paul Tansley Sep 25 '20 at 13:17
1

This calls for a join. If we make your seq variable into a data.frame, we can do the appropriate join with data.

library(dplyr)
seq <- data.frame(time_bin = seq(from = min, to = max, by = 100))
data %>% right_join(seq) %>% arrange(time_bin)
Joining, by = "time_bin"
# A tibble: 8 x 2
  cumulative.time time_bin
            <dbl>    <dbl>
1           2458.     2500
2           2580.     2600
3             NA      2700
4           2707.     2800
5           2840.     2900
6           2978.     3000
7             NA      3100
8           3122.     3200
Ben Norris
  • 5,639
  • 2
  • 6
  • 15