0

I have two tables. Which have the kind of formatting shown below. One of it is table A as such:

students|Test Score|Year
A       |  100     |1993
B       |   81     |1992
C       |   92     |1992
D       |   88     |1993

Another table B I have looks like this:

Class | Students | Year
1     | {A,D}    |1993
2     | {B,C}    |1992

I would like to perform some sort of manipulation in R where by I can search for the students listed in the array under the column in table B from table A and tabulate the scores into the following format:

Class | Students | Mean Score
1     | {A,D}    |   94
2     | {B,C}    |   86.5

Is there any formula which I can use to do the searching and then merge those results by some manipulation in R?

I know the above can be done with:

B$MeanScore <- sapply(strsplit(gsub("[{}]","", B$Students), split=","),
   function(x) mean(A$Test.Score[A$Students %in% x]))

But is there a way for me to add a second condition which is to match the year as well. The year of the class as well as the year of the test.

Jaap
  • 81,064
  • 34
  • 182
  • 193
user7729135
  • 399
  • 1
  • 3
  • 11
  • 1
    Is that really an array, or is that a string value with bracket characters. Please provide you sample input data in a more [reproducible format](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) to make it easier to help you. If that is a string, probably best to split those out into rows (maybe with the [splitstackshape](https://cran.r-project.org/package=splitstackshape) package) and to a proper join. – MrFlick Apr 26 '17 at 18:56
  • 1
    It would be easier if table B is in a long format. Then you can merge with table A to get `Class` and `aggregate()` for the mean. – jogo Apr 26 '17 at 18:59
  • http://stackoverflow.com/questions/24442669/duplicating-observations-of-a-dataframe-but-also-replacing-specific-variable-va – jogo Apr 26 '17 at 19:06
  • http://stackoverflow.com/questions/13773770/split-comma-separated-column-into-separate-rows – jogo Apr 26 '17 at 19:53

1 Answers1

0

In full agreement with jogo here:

A <- data.frame(students = c("A","B","C","D"), `Test Score` = c(100,81,92,88), Year = c(1993,1992,1992,1993))
A
#  students Test.Score Year
#1        A        100 1993
#2        B         81 1992
#3        C         92 1992
#4        D         88 1993

B <- data.frame(Class = c(1,2), Students = c("{A,D}","{B,C}"), Year = c(1993,1992))
B
#  Class Students Year
#1     1    {A,D} 1993
#2     2    {B,C} 1992

colnames(A) # taking note of the case sensitive "students" and "Year"
#[1] "students"   "Test.Score" "Year"   

s <- strsplit(gsub("[{}]","",B$Students), ",")
B.long <- data.frame(students = unlist(s), 
                     Class = rep(B$Class, sapply(s, length)), 
                     Year = rep(B$Year, sapply(s, length)))
B.long
#Students Class Year
#1        A     1 1993
#2        D     1 1993
#3        B     2 1992
#4        C     2 1992

Newdf <- merge.data.frame(A, B.long, c("Year","students"))
#Year students Test.Score Class
#1 1992        B         81     2
#2 1992        C         92     2
#3 1993        A        100     1
#4 1993        D         88     1

aggregate(Test.Score ~ Year + Class, Newdf, mean)
#Year Class Test.Score
#1 1993     1       94.0
#2 1992     2       86.5
Evan Friedland
  • 3,062
  • 1
  • 11
  • 25