3

I have two lists:

list 1:

id   name  age
1    jake  21
2    ashly 19
45   lana  18
51   james 23
5675 eric  25

list 2 (tv watch):

id  hours 
1   1.1
1   3
1   2.5
45  5.6
45  3
51  2  
51  1
51  2

this is just an example, the real lists are very big :list 1 - 5000 id's, list 2/3/4 - has more then 1 million rows (not a unique id).

I need for every list 2 and up to calculate average/sum/count to every id and add the value to list 1. notice that I need the calculation saved in another list with different row numbers.

example:

list 1:
    id   name  age  tv_average
    1    jake  21   2.2
    2    ashly 19   n/a
    45   lana  18   4.3
    51   james 23   1.6667
    5675 eric  25   n/a

this are my tries:

for (i in 1:nrow(list2)) {
  p <- subset(list2,list2$id==i)
  list2$tv_average[i==list2$id] <- sum(p$hours)/(nrow(p))
}

error: out of 22999 rows it only work on 21713 rows.

anat
  • 705
  • 2
  • 7
  • 20
  • You need to be more precise in your language and share data reproducibly. Both of these objects look like they are class `data.frame`, a special type of class `list`. `subset()` can work on data frames, but not on lists. This will all be cleared up if you share data reproducibly, either using `dput()` or sharing code to create sample data. [Please read this link about how to make good reproducible examples](http://stackoverflow.com/q/5963269/903061) and then edit your question to make your data reproducible. – Gregor Thomas Dec 22 '16 at 18:19
  • If you'd like to try more to solve on your own, it seems like this is a simple two-step process. 1. Create your average/sum/count for each id in list 2 - see the FAQ [How to sum a variable by group?](http://stackoverflow.com/q/1660124/903061). 2. Join that result to your list 1 - see the FAQ [How to join (merge) data frames in R?](http://stackoverflow.com/q/1299871/903061). You want a left-join in this case. – Gregor Thomas Dec 22 '16 at 18:23
  • You don't have to share real data. You just have to share the data you have above **reproducibly**, either using `dput()` or sharing code to create sample data. **Please read the link in my first comment** to learn about that. Also, as I said, *probably* you have data frames, not just lists. And subset works fine on data frames. I just can't tell for sure because your data is not shared reproducibly. – Gregor Thomas Dec 22 '16 at 18:36

1 Answers1

2

Try this

#Sample Data   
data1 = structure(list(id = c(1L, 2L, 45L, 51L, 5675L), name = structure(c(3L, 
1L, 5L, 4L, 2L), .Label = c("ashly", "eric", "jake", "james", 
"lana"), class = "factor"), age = c(21L, 19L, 18L, 23L, 25L) 
), .Names = c("id", 
"name", "age"), row.names = c(NA, -5L), class = "data.frame")

data2 = structure(list(id = c(1L, 1L, 1L, 3L, 45L, 45L, 51L, 51L, 51L, 
53L), hours = c(1.1, 3, 2.5, 10, 5.6, 3, 2, 1, 2, 6)), .Names = c("id", 
"hours"), class = "data.frame", row.names = c(NA, -10L))

# Use aggregate to calculate Average, Sum, and Count and Merge
merge(x = data1,
      y = aggregate(hours~id, data2, function(x)
            c(mean = mean(x),
              sum = sum(x),
              count = length(x))),
      by = "id",
      all.x = TRUE)
#    id  name age hours.mean hours.sum hours.count
#1    1  jake  21   2.200000  6.600000    3.000000
#2    2 ashly  19         NA        NA          NA
#3   45  lana  18   4.300000  8.600000    2.000000
#4   51 james  23   1.666667  5.000000    3.000000
#5 5675  eric  25         NA        NA          NA
d.b
  • 32,245
  • 6
  • 36
  • 77
  • @allinr Please include with your answer an explanation of what your code is doing so it will be more helpful for future readers. – Barker Dec 22 '16 at 22:02