how to calculate a specific subset in dataframe in r and save the calculation in another list

Question

I have two lists:

list 1:

id   name  age
1    jake  21
2    ashly 19
45   lana  18
51   james 23
5675 eric  25

list 2 (tv watch):

this is just an example, the real lists are very big :list 1 - 5000 id's, list 2/3/4 - has more then 1 million rows (not a unique id).

I need for every list 2 and up to calculate average/sum/count to every id and add the value to list 1. notice that I need the calculation saved in another list with different row numbers.

example:

list 1:
    id   name  age  tv_average
    1    jake  21   2.2
    2    ashly 19   n/a
    45   lana  18   4.3
    51   james 23   1.6667
    5675 eric  25   n/a

this are my tries:

for (i in 1:nrow(list2)) {
  p <- subset(list2,list2$id==i)
  list2$tv_average[i==list2$id] <- sum(p$hours)/(nrow(p))
}

error: out of 22999 rows it only work on 21713 rows.

You need to be more precise in your language and share data reproducibly. Both of these objects look like they are class `data.frame`, a special type of class `list`. `subset()` can work on data frames, but not on lists. This will all be cleared up if you share data reproducibly, either using `dput()` or sharing code to create sample data. [Please read this link about how to make good reproducible examples](http://stackoverflow.com/q/5963269/903061) and then edit your question to make your data reproducible. — Gregor Thomas, Dec 22 '16 at 18:19
If you'd like to try more to solve on your own, it seems like this is a simple two-step process. 1. Create your average/sum/count for each id in list 2 - see the FAQ [How to sum a variable by group?](http://stackoverflow.com/q/1660124/903061). 2. Join that result to your list 1 - see the FAQ [How to join (merge) data frames in R?](http://stackoverflow.com/q/1299871/903061). You want a left-join in this case. — Gregor Thomas, Dec 22 '16 at 18:23
You don't have to share real data. You just have to share the data you have above **reproducibly**, either using `dput()` or sharing code to create sample data. **Please read the link in my first comment** to learn about that. Also, as I said, *probably* you have data frames, not just lists. And subset works fine on data frames. I just can't tell for sure because your data is not shared reproducibly. — Gregor Thomas, Dec 22 '16 at 18:36

d.b · Accepted Answer · 2017-08-14T15:14:52.400

Try this

#Sample Data   
data1 = structure(list(id = c(1L, 2L, 45L, 51L, 5675L), name = structure(c(3L, 
1L, 5L, 4L, 2L), .Label = c("ashly", "eric", "jake", "james", 
"lana"), class = "factor"), age = c(21L, 19L, 18L, 23L, 25L) 
), .Names = c("id", 
"name", "age"), row.names = c(NA, -5L), class = "data.frame")

data2 = structure(list(id = c(1L, 1L, 1L, 3L, 45L, 45L, 51L, 51L, 51L, 
53L), hours = c(1.1, 3, 2.5, 10, 5.6, 3, 2, 1, 2, 6)), .Names = c("id", 
"hours"), class = "data.frame", row.names = c(NA, -10L))

# Use aggregate to calculate Average, Sum, and Count and Merge
merge(x = data1,
      y = aggregate(hours~id, data2, function(x)
            c(mean = mean(x),
              sum = sum(x),
              count = length(x))),
      by = "id",
      all.x = TRUE)
#    id  name age hours.mean hours.sum hours.count
#1    1  jake  21   2.200000  6.600000    3.000000
#2    2 ashly  19         NA        NA          NA
#3   45  lana  18   4.300000  8.600000    2.000000
#4   51 james  23   1.666667  5.000000    3.000000
#5 5675  eric  25         NA        NA          NA

@allinr Please include with your answer an explanation of what your code is doing so it will be more helpful for future readers. — Barker, Dec 22 '16 at 22:02

how to calculate a specific subset in dataframe in r and save the calculation in another list

1 Answers1