2

I have a vector of sample IDs that are required to be in my dataframe (otherwise the function I am applying to them doesn't work) but are missing (called missing).

For each of the elements in missing, I want to add a row to the end of my dataframe where I include the ID but the rest of the data (for all the other columns) in the row is all NAs.

What I am currently trying, based on some other Stack Overflow posts I saw that talk only about adding empty rows, is as follows:

for (element in missing) {
    df[nrow(df)+1,] <- NA
    df[nrow(df),1] <- element
}

Is there a simpler and faster way to do this, since it takes some time for even 1000 missing elements, whereas I might later have to deal with a lot more.

Jub
  • 397
  • 1
  • 3
  • 13
  • 2
    It would be better to do a `merge()` and let that function create the missing NA values. Adding one row at a time is very inefficient. It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Oct 25 '21 at 17:09
  • Jub, do either of the answers resolve your question? – r2evans Oct 27 '21 at 21:08

2 Answers2

4

1) Using the built-in anscombe data frame, this inserts two rows putting -1 and -3 in the x1 column.

library(tibble)
new <- c(-1, -3)
add_row(anscombe, x1 = new)

giving:

   x1 x2 x3 x4    y1   y2    y3    y4
1  10 10 10  8  8.04 9.14  7.46  6.58
2   8  8  8  8  6.95 8.14  6.77  5.76
3  13 13 13  8  7.58 8.74 12.74  7.71
4   9  9  9  8  8.81 8.77  7.11  8.84
5  11 11 11  8  8.33 9.26  7.81  8.47
6  14 14 14  8  9.96 8.10  8.84  7.04
7   6  6  6  8  7.24 6.13  6.08  5.25
8   4  4  4 19  4.26 3.10  5.39 12.50
9  12 12 12  8 10.84 9.13  8.15  5.56
10  7  7  7  8  4.82 7.26  6.42  7.91
11  5  5  5  8  5.68 4.74  5.73  6.89
12 -1 NA NA NA    NA   NA    NA    NA
13 -3 NA NA NA    NA   NA    NA    NA

2) Here is a base solution. new is from (1)

(If overwriting anscombe is ok, but typically this would make it harder to debug, then omit the first line and replace anscombe2 with anscombe.)

anscombe2 <- anscombe
anscombe2[nrow(anscombe2) + seq_along(new), "x1"] <- new

3) Using the tibble package (or dplyr which imports this) we can use rows_insert. new is from (1).

library(dplyr)
rows_insert(anscombe, tibble(x1 = new))
Sam Firke
  • 21,571
  • 9
  • 87
  • 105
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
1

Sample data:

samp <- data.frame(id = 1:10, val1 = 11:20, val2 = 21:30)
missing <- c(11, 13, 15)
  1. Merge:

    merge(samp, data.frame(id = missing), by = "id", all = TRUE)
    #    id val1 val2
    # 1   1   11   21
    # 2   2   12   22
    # 3   3   13   23
    # 4   4   14   24
    # 5   5   15   25
    # 6   6   16   26
    # 7   7   17   27
    # 8   8   18   28
    # 9   9   19   29
    # 10 10   20   30
    # 11 11   NA   NA
    # 12 13   NA   NA
    # 13 15   NA   NA
    
  2. Row-bind with an external package:

    data.table::rbindlist(list(samp, data.frame(id = missing)), use.names = TRUE, fill = TRUE)
    dplyr::bind_rows(samp, data.frame(id = missing))
    
  3. Row-bind with base R, a little more work:

    samp0 <- samp[rep(1, length(missing)),,drop = FALSE][NA,]
    samp0$id <- missing
    rownames(samp0) <- NULL
    rbind(samp, samp0)
    
r2evans
  • 141,215
  • 6
  • 77
  • 149