How to select rows by criteria connected with two tables

Question

I have a table with 2 columns, "name" and "grade". In "name" column I store data that can be replicated couple of times. To imagine the problem, let's create a simple short table like the one below:

list <- data.frame(c("Natalia", "Alex", "Adam", "Natalia", "Natalia", "Alex", "Natalia", "Adam"), c(5, 6, 5, 4, 5, 4, 3, 4))
colnames(list) <- c("name", "grade")

I'd like to get a dataframe with two columns - a list of unique data from column "name" in first one and with a sum of grades for each name in second. The first column I created like that:

n_occur <- data.frame(table(list$name))

and it works - I have a column of unique names from previous table.
Unfortunately I have no idea how to count grades for each name. It's more or less sth like pseudocode below, but I don't know r syntax well, so it's a bit hard for me.

sum(list$grades) where (list$names == n_occur$X1)

I think that I should combine filter with select somehow, but I didn't manage to do that.

This is what you are looking for? http://stackoverflow.com/questions/1660124/how-to-sum-a-variable-by-group — Gopala, Jul 18 '16 at 14:14

score 1 · Accepted Answer · edited Jul 18 '16 at 16:41

1

Is this what you are looking for;

library(dplyr)
list%>%
   group_by(name)%>%
   summarise(sum(grade))
#Source: local data frame [3 x 2]

#     name sum(grade)
#   (fctr)      (dbl)
#1    Adam          9
#2    Alex         10
#3 Natalia         17

edited Jul 18 '16 at 16:41

akrun

874,273
37
540
662

answered Jul 18 '16 at 14:14

rar

894
1
9
24

Thanks! I have one more question. Let's say we have one more column, f.ex. number of week in which person got the grade. I'd like to group data so that for each person, for each week, I have the sum of their grades. – Natalia Jul 19 '16 at 07:36
Actually I managed to to that, so maybe I share the solution if anyone need it in future. Firstly I changed from dataframe to data.table `library(data.table) dt <- setDT(list)[, by=c("name", "week", "grade")][]` And then I used .SD like that: `dt <- dt[ , lapply(.SD, sum), by = c("week", "name")]` what gave me what I needed ;) – Natalia Jul 19 '16 at 08:43

How to select rows by criteria connected with two tables

1 Answers1