Replace values in a dataset based on more than one condition in R

Question

I have a datset that looks something like this:

age  Year  f.pop   f.dc 
1    1990      0      1
5    2001    200      4
1    1990    400      2 
1    2001     50      3
5    2001      0      3

I want it to look like this:

age  Year  f.pop  f.dc 
1    1990    400     1
5    2001    200     4
1    1990    400     2
1    2001     50     3 
5    2001    200     3

Basically, I want to replace zero values in the f.pop column of my dataset with f.pop values of rows that match in two other columns (Year and age). The f.dc column is largely irrelevant to this question, but I want to emphasize that these rows are not identical and must remain separate.

Here's my attempt:

for (i in 1:length(usbd$f.pop)) {
  if (usbd$f.pop[i] == 0) {
     iage = usbd$age[i]  
     iyear = usbd$Year[i]
     index = which(usbd$age == iage & usbd$Year == iyear)
     usbd$f.pop[i] = usbd$f.pop[index] }}

But this is incredibly slow. There must be a more efficient way.

Conditional replacement of values in a data.frame is useful but I'm not sure how to apply this to two conditions with potentially different indices.

score 2 · Accepted Answer · edited Jul 27 '15 at 19:31

2

We could use data.table to replace the '0' values in 'f.pop' (assuming that 'f.pop' value is unique for each 'age', 'Year' group). Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by age and Year (.(age, Year)), we assign 'f.pop' as the non zero value in 'f.pop' (f.pop := f.pop[f.pop!=0]).

library(data.table)
setDT(df1)[, f.pop:= f.pop[f.pop!=0] , by = .(age, Year)]
df1
#   age Year f.pop f.dc
#1:   1 1990   400    1
#2:   5 2001   200    4
#3:   1 1990   400    2
#4:   1 2001    50    3
#5:   5 2001   200    3

data

df1 <- structure(list(age = c(1L, 5L, 1L, 1L, 5L), Year = c(1990L, 2001L, 
1990L, 2001L, 2001L), f.pop = c(0L, 200L, 400L, 50L, 0L), f.dc = c(1L, 
4L, 2L, 3L, 3L)), .Names = c("age", "Year", "f.pop", "f.dc"), 
class =  "data.frame", row.names = c(NA, -5L))

edited Jul 27 '15 at 19:31

David Arenburg

91,361
17
137
196

answered Jul 27 '15 at 19:16

akrun

874,273
37
540
662

@DavidArenburg Would the `.(age, Year)` works in 1.9.4? – akrun Jul 27 '15 at 19:31
1

So even if not, do you suggest to install the devel version only in order to avoid writing `list`? :) I think it will work, lemme check – David Arenburg Jul 27 '15 at 19:32
@DavidArenburg Alright, thanks :-) I was trying to promote the devel version as it include a lot of new functions. – akrun Jul 27 '15 at 19:34
1

Anyway according to [the documentation](https://github.com/Rdatatable/data.table) this feature was already added in v 1.9.4. Other than that, I hope Arun/Matt will release v 1.9.6 any time soon – David Arenburg Jul 27 '15 at 19:38

Replace values in a dataset based on more than one condition in R

1 Answers1

data