1

Here is a sample of my dataframe:

df3 <- data.frame(Frame = c(219388, 219389, 219390, 211387, 211388, 211389), Time = c("2020-06-05 13:26:39", "2020-06-05 13:26:39", "2020-06-05 13:26:39", "2020-06-05 13:26:39", "2020-06-05 13:26:39", "2020-06-05 13:26:39"),task = c("hop", "hop", "hop", "vj", "vj", "vj"), limb = c("L", "L", "L", "R", "R", "R"), trial = c("trial1", "trial1", "trial1", "trial2", "trial2", "trial2"))

I want to add NA's to specific rows in the Frame and Time column (amount of NA rows to be added will vary in my real dataset). I also need to continue the task, limb, and trial column accordingly (i.e. hop, L, trial1 continues even on NA rows). My expected output to look like this:

> df3 
Frame             Time               task     limb    trial   
219388    2020-06-05 13:26:39        hop       L      trial1
219389    2020-06-05 13:26:39        hop       L      trial1
219390    2020-06-05 13:26:39        hop       L      trial1
NA                 NA                hop       L      trial1
NA                 NA                hop       L      trial1
NA                 NA                hop       L      trial1
211387    2020-06-05 13:26:39        vj        R      trial2
211388    2020-06-05 13:26:39        vj        R      trial2
211389    2020-06-05 13:26:39        vj        R      trial2
NA                 NA                vj        R      trial2
NA                 NA                vj        R      trial2

I've tried insertRows from the berryFunctions package, however this changes the whole row to NA and I need task, limb, and trial columns to continue.

insertRows(df3, r=c(3:5), new=NA, rcurrent=FALSE)

Any help or suggestions would be much appreciated, thank you!

mpvalenc
  • 61
  • 5

1 Answers1

1

We could group_split based on 'task' to 'trial' column into a list of data.frames, then loop over the list with map2, slice the first row, convert the 'Frame', 'Time' to NA, expand the dataset rows with uncountusing the replication values passed in map2, bind the dataset with the original dataset (bind_rows) and as we are using map2_dfr, it returns a single data.frame by row binding the list

library(dplyr) #1.0.0
library(purrr)
library(tidyr)
df3 %>%
     group_split(across(task:trial)) %>%
     map2_dfr(c(3, 2), ~ 
         slice(.x, 1) %>% 
         mutate(across(Frame:Time, ~NA)) %>% 
         uncount(.y) %>% 
         bind_rows(.x, .))
# A tibble: 11 x 5
#    Frame Time                task  limb  trial 
#    <dbl> <chr>               <chr> <chr> <chr> 
# 1 219388 2020-06-05 13:26:39 hop   L     trial1
# 2 219389 2020-06-05 13:26:39 hop   L     trial1
# 3 219390 2020-06-05 13:26:39 hop   L     trial1
# 4     NA <NA>                hop   L     trial1
# 5     NA <NA>                hop   L     trial1
# 6     NA <NA>                hop   L     trial1
# 7 211387 2020-06-05 13:26:39 vj    R     trial2
# 8 211388 2020-06-05 13:26:39 vj    R     trial2
# 9 211389 2020-06-05 13:26:39 vj    R     trial2
#10     NA <NA>                vj    R     trial2
#11     NA <NA>                vj    R     trial2

The group_split is similar to base R split except that it have some options to either keep the grouping variables in the list of data.frames or not (and it won't name the list elements). The idea is to split into chunks of data.frame in a list where the values are the same in the grouping columns. So, it is a way of splitting the dataset automatically without manually suggesting the row at which it needs to add more NA rows.


Also, if the number of NAs to be added are constant, another option is group_by, summarise (in the dplyr 1.0.0 - summarise can return more than 1 row)

df3  %>%
     group_by(across(task:trial)) %>%
     summarise(across(everything(), ~ c(., rep(NA, 3))))
# A tibble: 12 x 5
# Groups:   task, limb, trial [2]
#   task  limb  trial   Frame Time               
#   <chr> <chr> <chr>   <dbl> <chr>              
# 1 hop   L     trial1 219388 2020-06-05 13:26:39
# 2 hop   L     trial1 219389 2020-06-05 13:26:39
# 3 hop   L     trial1 219390 2020-06-05 13:26:39
# 4 hop   L     trial1     NA <NA>               
# 5 hop   L     trial1     NA <NA>               
# 6 hop   L     trial1     NA <NA>               
# 7 vj    R     trial2 211387 2020-06-05 13:26:39
# 8 vj    R     trial2 211388 2020-06-05 13:26:39
# 9 vj    R     trial2 211389 2020-06-05 13:26:39
#10 vj    R     trial2     NA <NA>               
#11 vj    R     trial2     NA <NA>               
#12 vj    R     trial2     NA <NA>      

Also, with berryFunctions, after creating NA rows using insertRows, fill the columns of interest

library(berryFunctions)
insertRows(df3, r=4:6, new=NA, rcurrent= FALSE) %>% 
       insertRows(., r = 10) %>%
       fill(task:trial)
#    Frame                Time task limb  trial
#1  219388 2020-06-05 13:26:39  hop    L trial1
#2  219389 2020-06-05 13:26:39  hop    L trial1
#3  219390 2020-06-05 13:26:39  hop    L trial1
#4      NA                <NA>  hop    L trial1
#5      NA                <NA>  hop    L trial1
#6      NA                <NA>  hop    L trial1
#7  211387 2020-06-05 13:26:39   vj    R trial2
#8  211388 2020-06-05 13:26:39   vj    R trial2
#9  211389 2020-06-05 13:26:39   vj    R trial2
#10     NA                <NA>   vj    R trial2
#11     NA                <NA>   vj    R trial2
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Group split method worked. Could you explain the code a little further please? I'm fairly new to coding in R. This was just a sample of my data so I want to know what I need to edit in order to place NA's appropriately where I want. Thank you! – mpvalenc Jun 25 '20 at 23:23
  • I like the berryFunctions method best, can you explain how I would do this if my data set is larger (i.e. 500 rows) and I want to set row 50-100 with NA's, row 150-200 with NA's, row 250-300 with NA's etc. Thank you! – mpvalenc Jun 25 '20 at 23:27
  • @mpvalenc. The berryFunctions insertRows seems to be good if we want to insert NAs for a particular sequence of rows. Otherwise, it needs to readjust the number of rows. That was the reason I did a second insertRows statement. Regarding the group_split, it is splitting by groups. based on the data showed, it seems to be the case for you. I added some more explanation. Hope it helps – akrun Jun 26 '20 at 00:00