0

I have a couple of questions with my R script. I have a database with many series which have NA and numeric values. I would like to replace the NA by a 0 from the moment we have a numeric value but keep the NA if the serie is not started.

As we see below, for example in the second column I would like to keep the 2 first NA but replace the fourth by 0.

example

There is my script, but it doesn't work

my actual script

It would be very kind to have some suggestions

Many thanks

ER

2 Answers2

1

In case you, or anyone else, want to avoid for loops:

# example dataset
df = data.frame(x1 = c(23,NA,NA,35),
                x2 = c(NA,NA,45,NA),
                x3 = c(4,34,NA,5))

# function to replace NAs not in the beginning of vector with 0
f = function(x) { x[is.na(x) & cumsum(!is.na(x)) != 0] = 0; x }

# apply function and save as dataframe
data.frame(sapply(df, f))

#   x1 x2 x3
# 1 23 NA  4
# 2  0 NA 34
# 3  0 45  0
# 4 35  0  5

Or using tidyverse and the same function f:

library(tidyverse)

df %>% map_df(f)

# # A tibble: 4 x 3
#     x1    x2    x3
#   <dbl> <dbl> <dbl>
# 1   23.   NA     4.
# 2    0.   NA    34.
# 3    0.   45.    0.
# 4   35.    0.    5.
AntoniosK
  • 15,991
  • 2
  • 19
  • 32
0

if this is your dataset:

ORIGINAL_DATA <- data.frame(X1 = c(23, NA, NA, 35), 
                            X2 = c(NA, NA, 45, NA), 
                            X3 = c(4, 34, NA, 5))

This could probably work:

for(i in 1:ncol(ORIGINAL_DATA)) {
  for (j in 1:nrow(ORIGINAL_DATA)) {
    if(!is.na(ORIGINAL_DATA[j, i])) {
      ORIGINAL_DATA[c(j:nrow(ORIGINAL_DATA)), i] <- ifelse(is.na(ORIGINAL_DATA[c(j:nrow(ORIGINAL_DATA)), i]), 0, ORIGINAL_DATA[c(j:nrow(ORIGINAL_DATA)), i])

      # To end this for-loop
      j <- nrow(ORIGINAL_DATA)
    }
  }
}