Sapply with two conditions

Question

I have two tables. Which have the kind of formatting shown below. One of it is table A as such:

students|Test Score|Year
A       |  100     |1993
B       |   81     |1992
C       |   92     |1992
D       |   88     |1993

Another table B I have looks like this:

Class | Students | Year
1     | {A,D}    |1993
2     | {B,C}    |1992

I would like to perform some sort of manipulation in R where by I can search for the students listed in the array under the column in table B from table A and tabulate the scores into the following format:

Class | Students | Mean Score
1     | {A,D}    |   94
2     | {B,C}    |   86.5

Is there any formula which I can use to do the searching and then merge those results by some manipulation in R?

I know the above can be done with:

B$MeanScore <- sapply(strsplit(gsub("[{}]","", B$Students), split=","),
   function(x) mean(A$Test.Score[A$Students %in% x]))

But is there a way for me to add a second condition which is to match the year as well. The year of the class as well as the year of the test.

Is that really an array, or is that a string value with bracket characters. Please provide you sample input data in a more [reproducible format](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) to make it easier to help you. If that is a string, probably best to split those out into rows (maybe with the [splitstackshape](https://cran.r-project.org/package=splitstackshape) package) and to a proper join. — MrFlick, Apr 26 '17 at 18:56
It would be easier if table B is in a long format. Then you can merge with table A to get `Class` and `aggregate()` for the mean. — jogo, Apr 26 '17 at 18:59
http://stackoverflow.com/questions/24442669/duplicating-observations-of-a-dataframe-but-also-replacing-specific-variable-va — jogo, Apr 26 '17 at 19:06
http://stackoverflow.com/questions/13773770/split-comma-separated-column-into-separate-rows — jogo, Apr 26 '17 at 19:53

Evan Friedland · Answer 1 · 2017-06-23T14:57:25.533

In full agreement with jogo here:

A <- data.frame(students = c("A","B","C","D"), `Test Score` = c(100,81,92,88), Year = c(1993,1992,1992,1993))
A
#  students Test.Score Year
#1        A        100 1993
#2        B         81 1992
#3        C         92 1992
#4        D         88 1993

B <- data.frame(Class = c(1,2), Students = c("{A,D}","{B,C}"), Year = c(1993,1992))
B
#  Class Students Year
#1     1    {A,D} 1993
#2     2    {B,C} 1992

colnames(A) # taking note of the case sensitive "students" and "Year"
#[1] "students"   "Test.Score" "Year"   

s <- strsplit(gsub("[{}]","",B$Students), ",")
B.long <- data.frame(students = unlist(s), 
                     Class = rep(B$Class, sapply(s, length)), 
                     Year = rep(B$Year, sapply(s, length)))
B.long
#Students Class Year
#1        A     1 1993
#2        D     1 1993
#3        B     2 1992
#4        C     2 1992

Newdf <- merge.data.frame(A, B.long, c("Year","students"))
#Year students Test.Score Class
#1 1992        B         81     2
#2 1992        C         92     2
#3 1993        A        100     1
#4 1993        D         88     1

aggregate(Test.Score ~ Year + Class, Newdf, mean)
#Year Class Test.Score
#1 1993     1       94.0
#2 1992     2       86.5

@user7729135 Does this answer you question? – Evan Friedland Jun 23 '17 at 14:57 — Evan Friedland, Jun 23 '17 at 14:57

Sapply with two conditions

1 Answers1