1

Hey :) I am currently trying to clean up some data and I am struggling to find an easy solution for this. This is my dataset:

structure(list(sample = c(1, NA, NA, 2, NA, NA, 3, NA, NA, 4, 
NA, NA, 5, NA, NA, 6, NA, NA, 7, NA, NA, 8, NA, NA, 9, NA, NA, 
10, NA, NA, 11, NA, NA, 12, NA, NA, 13, NA, NA, 14, NA, NA, 15, 
NA, NA, 16, NA, NA, 17, NA, NA, 18, NA, NA, 19, NA, NA, 20, NA, 
NA), well = c("C1", "C3", "C5", "D1", "D3", "D5", "E1", "E3", 
"E5", "F1", "F3", "F5", "C7", "C9", "C11", "D7", "D9", "D11", 
"E7", "E9", "E11", "F7", "F9", "F11", "C13", "C15", "C17", "D13", 
"D15", "D17", "E13", "E15", "E17", "F13", "F15", "F17", "C19", 
"C21", "C23", "D19", "D21", "D23", "E19", "E21", "E23", "F19", 
"F21", "F23", "G1", "G3", "G5", "H1", "H3", "H5", "I1", "I3", 
"I5", "J1", "J3", "J5"), interp_conc = c(456582, 299611, 338462, 
449737, 395905, 546031, 511817, 473617, 455924, 408370, 461656, 
429297, 277609, 264949, 404073, 353142, 277509, 246494, 122663, 
163873, 169455, 188879, 192751, 255511, 185383, 205396, 187415, 
1897500, 1988346, 1854167, 365514, 295724, 262695, 270446, 241531, 
209386, 223774, 255885, 181214, 420567, 482818, 443318, 262886, 
220969, 283763, 229457, 261859, 202067, 226157, 177300, 215454, 
481414, 586233, 383855, 218949, 226852, 244989, 192648, 228195, 
201096)), row.names = c(NA, -60L), class = c("tbl_df", "tbl", 
"data.frame"))

It basically looks like this: Data Frame

It's data from an experiment done in triplicates. This means, the first three rows are sample 1, the next three rows are sample 2, ...

So basically what I need is a function that whenever it finds an NA it takes the value from the row above. Is there something like this in R? I was not able to find one.

What I tried to do instead was to just add another column - "condition" - using the mutate function. Since the experiment I did was performed five times, I was hoping that the vector would just be recycled. This was my try:

temp %>% mutate(condition = c("UT", "UT", "UT",
                              "Stimuli", "Stimuli","Stimuli",
                              "Inhib1", "Inhib1","Inhib1",
                              "Inhib2", "Inhib2", "Inhib2"))

But since it does not seem possible to do vector recycling with the dplyr::mutate function I also was not able to do this.

Going with this second approach would have the advantage that it directly adds crucial information that I would otherwise have to add in a second step. My original idea was to first solve the sample column issue and then, using if statements, add the experimental condition...

Does anyone have any idea how I could solve this problem?

Lena
  • 21
  • 4
  • 2
    try `tidyr::fill` or `zoo::na.locf` – AnilGoyal Jun 04 '21 at 15:50
  • 1
    see [this](https://stackoverflow.com/questions/2776135/last-observation-carried-forward-in-a-data-frame) or [this](https://stackoverflow.com/questions/7735647/replacing-nas-with-latest-non-na-value) – AnilGoyal Jun 04 '21 at 15:52

2 Answers2

2

Assuming that the non-NA entries don't decrease (as in your example), you could do cummax(ifelse(is.na(x), 0, x)), where x is the vector you want to transform in this way (looks like temp$sample in what you have provided).

The logic: cummax(), the cumulative max function, returns the largest number encountered sequentially in a vector. However, it doesn't handle NA values well; this is what the ifelse() call is for. We use ifelse() to replace each NA with 0, then use cummax() to extract the largest value previously encountered.

Example:

x <- c(1, NA, NA, 2, NA, NA, NA, 3, NA, 4)

cummax(ifelse(is.na(x), 0, x))
## [1] 1 1 1 2 2 2 2 3 3 4
Aaron Montgomery
  • 1,387
  • 8
  • 11
1

You can use either of these solutions as specified in the comments:

library(dplyr)
library(zoo)

df %>%
  mutate(across(sample, ~ na.locf(.x)))

# A tibble: 60 x 3
   sample well  interp_conc
    <dbl> <chr>       <dbl>
 1      1 C1         456582
 2      1 C3         299611
 3      1 C5         338462
 4      2 D1         449737
 5      2 D3         395905
 6      2 D5         546031
 7      3 E1         511817
 8      3 E3         473617
 9      3 E5         455924
10      4 F1         408370
# ... with 50 more rows

Or

library(tidyr)

df %>%
  fill(sample, .direction = "down")

# A tibble: 60 x 3
   sample well  interp_conc
    <dbl> <chr>       <dbl>
 1      1 C1         456582
 2      1 C3         299611
 3      1 C5         338462
 4      2 D1         449737
 5      2 D3         395905
 6      2 D5         546031
 7      3 E1         511817
 8      3 E3         473617
 9      3 E5         455924
10      4 F1         408370
# ... with 50 more rows
Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41