1

I'm an unexperienced user of R and I need to create quite a complicated stuff. My dataset looks like this :

dataset

a,b,c,d,e are different individuals. I want to complete the D column as follows : At the last line for each individual in the col A, D = sum(C)/(B-1).

Expected results should look like :

results

D4=sum(C2:C4)/(B4-1)=0.5

D6=sum(C5:C6)/(B6-1)=1, etc.

I attempted to deal with it with something like :

for(i in 2:NROW(dataset)){
  dataset[i,4]<-ifelse(
   (dataset[i,1]==data1[i-1,1]),sum(dataset[i,3])/(dataset[i,2]-1),NA
  )
}

But it is obviously not sufficient, as it computes the D value for all the rows and not only the last for each individual, and it does not calculate the sum of C values for this individual.

And I really don't know how to figure it out. Do you guys have any advice ? Many thanks.

Lef Lef
  • 11
  • 2
  • Welcome to SO. Please [have a look at this link](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Sotos May 16 '17 at 09:40
  • You haven't provided us with enough information, here's the first data.frame df <- data.frame(A=c("a","a","a","b","b","c","c","c","d","e","e"), B=c(3,3,3,2,2,3,3,3,1,2,2), C=c(NA,1,0,NA,1,NA,0,1,NA,NA,0)) modify your question to tell us what the second one looks like? – DarrenRhodes May 16 '17 at 14:34

2 Answers2

0

If I understood your question correctly, then this is one approach to get to the desired result:

df <- data.frame(
A=c("a","a","a","b","b","c","c","c","d","e","e"),
B=c(3,3,3,2,2,3,3,3,1,2,2),
C=c(NA,1,0,NA,1,NA,0,1,NA,NA,0), 
stringsAsFactors = FALSE)

for(i in 2:NROW(df)){
  df[i,4]<-ifelse(
    (df[i,1]!=df[i+1,1] | i == nrow(df)),sum(df[df$A == df[i,1],]$C, na.rm=TRUE)/(df[i,2]-1),NA
  )
}

This code results in the following table:

   A B  C  V4
1  a 3 NA  NA
2  a 3  1  NA
3  a 3  0 0.5
4  b 2 NA  NA
5  b 2  1 1.0
6  c 3 NA  NA
7  c 3  0  NA
8  c 3  1 0.5
9  d 1 NA NaN
10 e 2 NA  NA
11 e 2  0 0.0

The ifelse first tests if the individual of the current row of column A is different than the individual in the next row OR if it's the last row.

If it is the last row with this individual it takes the sum of column C (ignoring the NAs) of the rows with the individual present in column A divided by the value in column B minus one.

Otherwise it puts an NA in the fourth column.

jkrainer
  • 413
  • 5
  • 10
0

Using dplyr you can try generating D for all rows and then remove where not required:

dftest %>%
  group_by(A,B) %>%
  dplyr::mutate(D = sum(C, na.rm=TRUE)/(B-1)) %>%
  dplyr::mutate(D = if_else(row_number()== n(), D, as.double(NA)))

which gives:

Source: local data frame [11 x 4]
Groups: A, B [5]

       A     B     C     D
   <chr> <dbl> <dbl> <dbl>
1      a     3    NA    NA
2      a     3     1    NA
3      a     3     0   0.5
4      b     2    NA    NA
5      b     2     1   1.0
6      c     3    NA    NA
7      c     3     0    NA
8      c     3     1   0.5
9      d     1    NA   NaN
10     e     2    NA    NA
11     e     2     0   0.0
Aramis7d
  • 2,444
  • 19
  • 25