0

I have two tables with students' data. The first table contains the grades that the students achieved in three individual courses:

student_id    course             grade
1             English            6
1             maths              8
1             biology            6
2             English            5
2             maths              7
2             biology            6.5

The second table contains the students' average grade (over the three courses).

student_id    average_grade
1             6.7
2             6.2

I want a new table that looks like this, containing both the average grade and the English grade:

student_id    average_grade     English
1             6.7               6
2             6.2               5

How do I get this third table?

Johanna
  • 1,019
  • 2
  • 9
  • 20

4 Answers4

3
library(tidyverse)
df1<-data.frame(studentid = c(1,1,1,2,2,2), course = c('Eng', 'maths', 'bio','Eng' ,'maths', 'bio' ), grade = c(6,8,6,5,7,6.5))
df2<-data.frame(studentid = c(1,2), average_grade = c(6.7,6.2))



inner_join(df1, df2) %>% 
  spread(course, grade) %>% 
  select(studentid,average_grade,Eng)

Joining, by = "studentid"
  studentid average_grade Eng
1         1           6.7   6
2         2           6.2   5
vsb
  • 428
  • 6
  • 15
2

Perhaps this way, for example:

library(tidyverse)
d1 <- data.frame(id = c(1,1,2,2), course = c("English", "Math", "English", "Math"), grade = c(6,8,5,7))
d2 <- data.frame(id = c(1,2), avg = c(6.7, 6.2))
merge(d1, d2) %>% filter(course == "English") %>% spread(course, grade)

  id avg English
1  1 6.7       6
2  2 6.2       5
erocoar
  • 5,723
  • 3
  • 23
  • 45
  • Thanks for your reply! It isn't working for me, however. I get: Error: could not find function %>%. Also, I don't want the Math grade, only the English grade. – Johanna Feb 20 '18 at 11:38
  • 1
    In that case use `merge(d1, d2) %>% filter(course == "English") %>% spread(course, grade)` - and for your error, make sure the `tidyverse` package is installed and loaded – erocoar Feb 20 '18 at 11:41
1

This will do

df1=tibble(id=c(1,1,1,2,2,2),course=c("English","maths","biology","English","maths","biology"),
           grade=c(6,8,6,5,7,6.5))
df2=tibble(id=c(1,2),average_grade=c(6.7,6.2))
df0=df1%>%group_by(id,course)%>%summarize(English=mean(grade))%>%filter(course=="English")
merge(df0,df2,by="id")
Antonios
  • 1,919
  • 1
  • 11
  • 18
1

Of all the resources I have come across, I think this is one of the absolute best I have seen for merging data frames.

By using the merge function and its optional parameters:

Inner join: merge(df1, df2) will work for these examples because R automatically joins the frames by common variable names, but you would most likely want to specify merge(df1, df2, by = "CustomerId") to make sure that you were matching on only the fields you desired. You can also use the by.x and by.y parameters if the matching variables have different names in the different data frames.

Outer join: merge(x = df1, y = df2, by = "CustomerId", all = TRUE)

Left outer: merge(x = df1, y = df2, by = "CustomerId", all.x = TRUE)

Right outer: merge(x = df1, y = df2, by = "CustomerId", all.y = TRUE)

Cross join: merge(x = df1, y = df2, by = NULL)

How to join (merge) data frames (inner, outer, left, right)?

ASH
  • 20,759
  • 19
  • 87
  • 200