4

I have a dataset of 240 cases, in which I want to create a blank row after each existing row. Leaving me with 480 rows, of which half is filled and the other half is empty (which I then want to fill with some data myself).

Example of data

  id groep_MNC zkhs fbeh    pgebdat    p_age pgesl
1  3         1    1    1 1955-12-01 42.50000     1
2  5         1    1    1 1943-04-09 55.16667     1
3  7         1    1    1 1958-04-10 40.25000     1
4 10         1    1    1 1958-04-17 40.25000     1
5 12         1    1    2 1947-11-01 50.66667     1
6 14         1    1    2 1952-02-02 46.41667     1

Ideally, 'id' should be copied, thus looking like this:

    id groep_MNC zkhs fbeh    pgebdat    p_age pgesl
1    3         1    1    1 1955-12-01 42.50000     1
2    3        NA   NA   NA         NA       NA    NA
3    5         1    1    1 1943-04-09 55.16667     1
4    5        NA   NA   NA         NA       NA    NA
5    7         1    1    1 1958-04-10 40.25000     1
6    7        NA   NA   NA         NA       NA    NA
7   10         1    1    1 1958-04-17 40.25000     1
8   10        NA   NA   NA         NA       NA    NA
9   12         1    1    2 1947-11-01 50.66667     1
10  12        NA   NA   NA         NA       NA    NA
11  14         1    1    2 1952-02-02 46.41667     1
12  14        NA   NA   NA         NA       NA    NA

I've tried copying all the rows with this code:

mydf_long <- mydf[rep(1:nrow(mydf), each = 2),]

But as you can see, that is not even close to what I want to end up with.

Edit: Thanks for the edits and comments. I need to transform my original data to a format that is suitable for multilevel analyses. However, the data is still quite messy so other approaches that initially worked on a small subset of my data, didn't work on my full set. For more information about the background, see my other questions:

Reshape/gather function to create dataset ready for multilevel analysis

Tidy up and reshape messy dataset (reshape/gather/unite function)?

R - replace values by row given some statement in if loop with another value in same df

Since I have relative 'few' partner variables, I now want to create blank lines, and fill them in with the partner data.

Hannie
  • 417
  • 5
  • 17
  • How do you plan to "fill in" the blank rows you create? – A5C1D2H2I1M1N2O1R2T1 Sep 17 '17 at 09:45
  • 1
    I think this is your answer: https://stackoverflow.com/questions/16453452/how-can-i-add-rows-to-an-r-data-frame-every-other-row – gst Sep 17 '17 at 09:58
  • 3
    Possible duplicate of [How can I add rows to an R data frame every other row?](https://stackoverflow.com/questions/16453452/how-can-i-add-rows-to-an-r-data-frame-every-other-row) – gst Sep 17 '17 at 09:59
  • 1
    What are you trying to accomplish? Maybe making and filling blank rows isn't the best approach, but it's difficult to judge without knowing the full story. – lebelinoz Sep 17 '17 at 11:02

3 Answers3

7

We can duplicate each row and then set the row with even row numbers to be NA.

dt2 <- dt[rep(1:nrow(dt), each = 2), ]
dt2[1:nrow(dt2) %% 2 == 0, ] <- NA

head(dt2)
    id groep_MNC zkhs fbeh    pgebdat    p_age pgesl
1    3         1    1    1 1955-12-01 42.50000     1
1.1 NA        NA   NA   NA       <NA>       NA    NA
2    5         1    1    1 1943-04-09 55.16667     1
2.1 NA        NA   NA   NA       <NA>       NA    NA
3    7         1    1    1 1958-04-10 40.25000     1
3.1 NA        NA   NA   NA       <NA>       NA    NA

DATA

dt <- read.table(text = "  id groep_MNC zkhs fbeh    pgebdat    p_age pgesl
1  3         1    1    1 1955-12-01 42.50000     1
2  5         1    1    1 1943-04-09 55.16667     1
3  7         1    1    1 1958-04-10 40.25000     1
4 10         1    1    1 1958-04-17 40.25000     1
5 12         1    1    2 1947-11-01 50.66667     1
6 14         1    1    2 1952-02-02 46.41667     1",
                 header = TRUE, stringsAsFactors = FALSE)
www
  • 38,575
  • 12
  • 48
  • 84
2

Try this:

require(dplyr)

df %>% 
  group_by(id) %>% 
  do(rbind(.,c(.$id,rep(NA,NCOL(df)-1)))) %>%
  ungroup() %>% data.frame()

Output:

   id groep_MNC zkhs fbeh    pgebdat    p_age pgesl
1   3         1    1    1 1955-12-01 42.50000     1
2   3        NA   NA   NA       <NA>       NA    NA
3   5         1    1    1 1943-04-09 55.16667     1
4   5        NA   NA   NA       <NA>       NA    NA
5   7         1    1    1 1958-04-10 40.25000     1
6   7        NA   NA   NA       <NA>       NA    NA
7  10         1    1    1 1958-04-17 40.25000     1
8  10        NA   NA   NA       <NA>       NA    NA
9  12         1    1    2 1947-11-01 50.66667     1
10 12        NA   NA   NA       <NA>       NA    NA
11 14         1    1    2 1952-02-02 46.41667     1
12 14        NA   NA   NA       <NA>       NA    NA

Sample data:

require(data.table)
df <- fread("id groep_MNC zkhs fbeh    pgebdat    p_age pgesl
              3         1    1    1 1955-12-01 42.50000     1
              5         1    1    1 1943-04-09 55.16667     1
              7         1    1    1 1958-04-10 40.25000     1
             10         1    1    1 1958-04-17 40.25000     1
             12         1    1    2 1947-11-01 50.66667     1
             14         1    1    2 1952-02-02 46.41667     1")
www
  • 4,124
  • 1
  • 11
  • 22
  • 1
    @HannekeLettinga - Your question asks for copied ''id', but your original sample output showed the row names being repeated. I'm not sure which you want, but in case you want the ID values themselves repeated, with the rest of the row blank, this will solve your question. Otherwise, ycw has a great answer for creating completely blank rows at every other index. – www Sep 17 '17 at 18:29
  • Thanks Ryan. You are right that my original sample didn't show what I actually want. Thank you for clarifying and for your response. When I try to run your code I get an error though, which I don't directly understand: Error in as.Date.numeric(value) : 'origin' must be supplied. I don't know where the problem with dates comes from, do you have any idea? – Hannie Sep 19 '17 at 13:56
  • @HannekeLettinga - Hi, you're welcome. You're right, that error typically occurs when dealing with date conversion. That seems odd that it would pop up here, since my code doesn't directly handle any dates/class conversions. It's likely that error is coming from a different cause. To help, I've included some sample data that you can do a smaller scale test with. After importing the sample data I've provided into R, note the classes of the columns in the sample data with sapply(df,class), and make sure your real data matches the classes of that sample data. That should help. – www Sep 19 '17 at 16:07
  • @www I know this is old, but can you explain why this loops through every row? I've been trying to understand how to add multiple rows at once but I'm unsure why the do here goes through the entire df. – aarsmith Mar 24 '23 at 22:17
2

Another option using dplyr:

library(dplyr)
df %>%
  split(df$id) %>%
  Map(rbind, ., NA) %>%
  do.call(rbind, .) %>%
  mutate(id = rep(df$id, each = 2))

Or you can use map_dfr from purrr:

library(purrr)
df %>%
  group_by(id) %>%
  map_dfr(rbind, NA) %>%
  mutate(id = rep(df$id, each = 2))

Result:

# A tibble: 12 x 7
      id groep_MNC  zkhs  fbeh    pgebdat    p_age pgesl
   <int>     <int> <int> <int>      <chr>    <dbl> <int>
 1     3         1     1     1 1955-12-01 42.50000     1
 2     3        NA    NA    NA       <NA>       NA    NA
 3     5         1     1     1 1943-04-09 55.16667     1
 4     5        NA    NA    NA       <NA>       NA    NA
 5     7         1     1     1 1958-04-10 40.25000     1
 6     7        NA    NA    NA       <NA>       NA    NA
 7    10         1     1     1 1958-04-17 40.25000     1
 8    10        NA    NA    NA       <NA>       NA    NA
 9    12         1     1     2 1947-11-01 50.66667     1
10    12        NA    NA    NA       <NA>       NA    NA
11    14         1     1     2 1952-02-02 46.41667     1
12    14        NA    NA    NA       <NA>       NA    NA
acylam
  • 18,231
  • 5
  • 36
  • 45
  • Nice answer, how and where can I best master purrr?The vignettes? – NelsonGon Jan 28 '19 at 16:52
  • 1
    @NelsonGon The vignettes are good places to start. I find the "Intro to purrr" (https://emoriebeck.github.io/R-tutorials/purrr/) and the cheat sheets also quite helpful (https://purrr.tidyverse.org/) – acylam Jan 28 '19 at 17:18
  • 1
    @NelsonGon The first link is actually more for creating column lists and nested data frames. This lesson in datacamp might be more useful as an in-depth overview: https://www.datacamp.com/courses/intermediate-functional-programming-with-purrr – acylam Jan 28 '19 at 17:23
  • oh, thanks. Hopefully datacamp doesn't use video tutorials. I find reading easier. Thanks again! – NelsonGon Jan 28 '19 at 17:26