0

I have data loaded from a csv file (mydata.csv).

mydata = read.csv('somefile')

The two columns i want to work with(mydata['name'] and mydata['score']) have data like so

name     score
sally     5
peter     10
sally     50
peter     25
mandy     100
mandy     0

The data set has more than 3 names but for example purposes i only gave three names. What i want to do is get the top 10 names with the highest average score and store that information.

Also what data type is best to store the results in (array, list, etc) if i want to graph these points(names, score) in a ggplot using x for name and y for score.

jumpman8947
  • 427
  • 2
  • 7
  • 17

1 Answers1

2

I create the dataframe and will limit myself to 2 names with highest average score instead of your original 10 because of the data limit:

  df<-data.frame(name =c('sally','peter','sally','peter','mandy','mandy'),score=c(5,10,50,25,100,0))

 library("dplyr")
  FinalOutput <- df %>%
  group_by(name) %>% #group by name
  summarise(avg_score=mean(score)) %>% #create variabele "avg_score" which hold the mean of scores for each name
  top_n(2) %>% #select the top 2, you can change it to 10 with your real data 
  arrange(desc(avg_score)) #arrange in a descending fashion to get the names with highest avg_score 

Here is a screenshot of the output:

 # A tibble: 2 x 2
 #   name  avg_score
 #   <fct>     <dbl>
 #1 mandy      50.0
 #2 sally      27.5

here is to save it:

 write.csv(FinalOutput,file="FinalOutput.csv")  

and here is to plot it:

 print(ggplot(data = FinalOutput, aes(x = name, y = avg_score))+ geom_point())
Shirin Yavari
  • 626
  • 4
  • 6