1

I have a datset that looks something like this:

age  Year  f.pop   f.dc 
1    1990      0      1
5    2001    200      4
1    1990    400      2 
1    2001     50      3
5    2001      0      3

I want it to look like this:

age  Year  f.pop  f.dc 
1    1990    400     1
5    2001    200     4
1    1990    400     2
1    2001     50     3 
5    2001    200     3 

Basically, I want to replace zero values in the f.pop column of my dataset with f.pop values of rows that match in two other columns (Year and age). The f.dc column is largely irrelevant to this question, but I want to emphasize that these rows are not identical and must remain separate.

Here's my attempt:

for (i in 1:length(usbd$f.pop)) {
  if (usbd$f.pop[i] == 0) {
     iage = usbd$age[i]  
     iyear = usbd$Year[i]
     index = which(usbd$age == iage & usbd$Year == iyear)
     usbd$f.pop[i] = usbd$f.pop[index] }} 

But this is incredibly slow. There must be a more efficient way.

Conditional replacement of values in a data.frame is useful but I'm not sure how to apply this to two conditions with potentially different indices.

Community
  • 1
  • 1
heo
  • 141
  • 5

1 Answers1

2

We could use data.table to replace the '0' values in 'f.pop' (assuming that 'f.pop' value is unique for each 'age', 'Year' group). Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by age and Year (.(age, Year)), we assign 'f.pop' as the non zero value in 'f.pop' (f.pop := f.pop[f.pop!=0]).

library(data.table)
setDT(df1)[, f.pop:= f.pop[f.pop!=0] , by = .(age, Year)]
df1
#   age Year f.pop f.dc
#1:   1 1990   400    1
#2:   5 2001   200    4
#3:   1 1990   400    2
#4:   1 2001    50    3
#5:   5 2001   200    3

data

df1 <- structure(list(age = c(1L, 5L, 1L, 1L, 5L), Year = c(1990L, 2001L, 
1990L, 2001L, 2001L), f.pop = c(0L, 200L, 400L, 50L, 0L), f.dc = c(1L, 
4L, 2L, 3L, 3L)), .Names = c("age", "Year", "f.pop", "f.dc"), 
class =  "data.frame", row.names = c(NA, -5L))
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
akrun
  • 874,273
  • 37
  • 540
  • 662
  • @DavidArenburg Would the `.(age, Year)` works in 1.9.4? – akrun Jul 27 '15 at 19:31
  • 1
    So even if not, do you suggest to install the devel version only in order to avoid writing `list`? :) I think it will work, lemme check – David Arenburg Jul 27 '15 at 19:32
  • @DavidArenburg Alright, thanks :-) I was trying to promote the devel version as it include a lot of new functions. – akrun Jul 27 '15 at 19:34
  • 1
    Anyway according to [the documentation](https://github.com/Rdatatable/data.table) this feature was already added in v 1.9.4. Other than that, I hope Arun/Matt will release v 1.9.6 any time soon – David Arenburg Jul 27 '15 at 19:38