Replacing NA's with a specific condition in R

Question

In case 2017 is NA and columns of 2015 and 2016 have value, I want to assign average of them to 2017 based on the same row.

Index   2015            2016            2017
1       NA              6355698         10107023
2       13000000        73050000        NA
4       NA              NA              NA
5       10500000        NA              8000000
6       331000000       659000000       1040000000
7       55500000        NA              32032920
8       NA              NA              20000000
9       2521880         5061370         7044288
...

Here is that I tried, didn't work!

ind <- which(is.na(df), arr.ind=TRUE)
df[ind] <- rowMeans(df,  na.rm = TRUE)[ind[,1]]

Also if we have values in 2015 and 2017 columns and 2016 is NA, I want to assign average of them to the column of 2016 based on the same row. Any help would be appreciated!

Maurits Evers · Answer 1 · 2018-07-17T10:17:49.943

Disclaimer: I'm not entirely clear on what your expected output is. My solution below is based on the assumption that you want to replace NA values with either the mean of all values for every year or with the mean value of all values for every Index.

Here is a tidyverse option first spreading from wide to long, replacing NAs with the mean value per year, and finally converting back from long to wide.

library(tidyverse)
df %>%
    gather(year, value, -Index) %>%
    group_by(year) %>%
    mutate(value = ifelse(is.na(value), mean(value, na.rm = T), value)) %>%
    spread(year, value)
## A tibble: 8 x 4
#  Index     `2015`     `2016`      `2017`
#  <int>      <dbl>      <dbl>       <dbl>
#1     1 115507293.   6355698.   10107023.
#2     2  13000000. 223472356.  186197372.
#3     4 115507293. 223472356.  186197372.
#4     5 115507293. 223472356.    8000000.
#5     6 331000000. 659000000. 1040000000.
#6     7 115507293. 223472356.   32032920.
#7     8 115507293. 223472356.   20000000.
#8     9   2521880.   5061370.    7044288.

Note that here we replace NAs with mean value per year. If instead you want to replace NAs with the mean value per Index value, simply replace group_by(year) with group_by(Index):

df %>%
    gather(year, value, -Index) %>%
    group_by(Index) %>%
    mutate(value = ifelse(is.na(value), mean(value, na.rm = T), value)) %>%
    spread(year, value)
## A tibble: 8 x 4
## Groups:   Index [8]
#  Index     `2015`     `2016`      `2017`
#  <int>      <dbl>      <dbl>       <dbl>
#1     1   8231360.   6355698.   10107023.
#2     2  13000000.  13000000.   13000000.
#3     4       NaN        NaN         NaN
#4     5   8000000.   8000000.    8000000.
#5     6 331000000. 659000000. 1040000000.
#6     7  32032920.  32032920.   32032920.
#7     8  20000000.  20000000.   20000000.
#8     9   2521880.   5061370.    7044288.

Update

To only replace NAs in column 2017 with the row average based on the 2015,2016 values you can do

df <- read_table("Index   2015            2016            2017
1       NA              6355698         10107023
2       13000000        73050000        NA
4       NA              NA              NA
5       10500000        NA              8000000
6       331000000       659000000       1040000000
7       55500000        NA              32032920
8       NA              NA              20000000
9       2521880         5061370         7044288")


df %>%
    mutate(`2017` = ifelse(is.na(`2017`), 0.5 * (`2015` + `2016`), `2017`))
## A tibble: 8 x 4
#  Index    `2015`    `2016`      `2017`
#  <int>     <int>     <int>       <dbl>
#1     1        NA   6355698   10107023.
#2     2  13000000  73050000   43025000.
#3     4        NA        NA         NA
#4     5  10500000        NA    8000000.
#5     6 331000000 659000000 1040000000.
#6     7  55500000        NA   32032920.
#7     8        NA        NA   20000000.
#8     9   2521880   5061370    7044288.

Sample data

df <- read_table("Index   2015            2016            2017
1       NA              6355698         10107023
2       13000000        NA              NA
4       NA              NA              NA
5       NA              NA              8000000
6       331000000       659000000       1040000000
7       NA              NA              32032920
8       NA              NA              20000000
9       2521880         5061370         7044288")

Should it be `group_by(Index)`? As OP is searching rowMeans? — Roman, Jul 17 '18 at 10:02
@Jimbou Hmm. Yes, perhaps. I'm a bit confused, because in that case `Index = 4` will have all `NA`s. I'll make a note. — Maurits Evers, Jul 17 '18 at 10:04
Thanks guys! I edited the sample data because the previous one didn't have what I want you show you as an example — kimi-finn379, Jul 17 '18 at 10:09
@kimi-finn379 So you only want to replace `NA`s for the year 2017? — Maurits Evers, Jul 17 '18 at 10:12
For 2017 and 2016. To clarify average of columns must be based on the same row as well — kimi-finn379, Jul 17 '18 at 10:14
Not clear. *"In case 2017 is NA and columns of 2015 and 2016 have value, I want to assign average of them to 2017 based on the same row."* So you only want to replace `NA` values in column `2017`. `NA`s in columns `2015` and `2016` will stay the same. Correct? — Maurits Evers, Jul 17 '18 at 10:16
@kimi-finn379 I've included an update where I show how to replace `NA`s in column `2017` with the row average from the `2015`/`2016` values. — Maurits Evers, Jul 17 '18 at 10:18
Yes, this is exactly what I want. Sorry for the bad explanation. — kimi-finn379, Jul 17 '18 at 10:20
In case I want to remove rows which have 2 or 3 NA's in the column of 2015, 2016, 2017. What can I do? — kimi-finn379, Jul 17 '18 at 10:27
@kimi-finn379 To remove rows with `>= 2` `NA`s per row you can do `df[apply(df, 1, function(x) sum(is.na(x)) < 2), ]` — Maurits Evers, Jul 17 '18 at 11:43

Replacing NA's with a specific condition in R

1 Answers1

Update

Sample data