0

I have multiple .csv files, every on of this has a column (called: Data) that I want to compare with each other. But first, I have to group the values in a column of each file. In the end I want to have multiple colored "lines" with the mean value of each group in one graph. I will describe the process I use to get the graph I want below. This works for a single file but I don't know how to add multiple "lines" of multiple files in one graph using ggplot.

This is what I got so far:

data = read.csv(file="my01data.csv",header=FALSE, sep=",")

A single .csv File looks like the following, but without the headline

 ID Data Range 
 1,63,5.01   
 2,61,5.02  
 3,65,5.00  
 4,62,4.99
 5,62,4.98  
 6,64,5.01  
 7,71,4.90  
 8,72,4.93  
 9,82,4.89  
10,82,4.80  
11,83,4.82  
10,85,4.79   
11,81,4.80 

After getting the data I group it with the following lines:

data["Group"] <- NA
data[(data$Range>4.95), "Group"] <- 5.0
data[(data$Range>4.85 & data$Range<4.95), "Group"] <- 4.9
data[(data$Range>4.75 & data$Range<4.85), "Group"] <- 4.8

The final data looks like this:

myTable <- "ID Data Range Group
        1     63   5.01   5.00
        2     61   5.02   5.00
        3     65   5.00   5.00
        4     62   4.99   5.00
        5     62   4.98   5.00
        6     64   5.01   5.00 
        7     71   4.90   4.90
        8     72   4.93   4.90
        9     72   4.89   4.90
       10     82   4.80   4.80
       11     83   4.82   4.80
       10     85   4.79   4.80
       11     81   4.80   4.80"
myData <- read.table(text=myTable, header = TRUE)

To plot this dataframe I use the following lines:

 ( pplot <- ggplot(data=myDAta, aes(x=myDAta$Group, y=myDAta$Data)) 
  + stat_summary(fun.y = mean, geom = "line", color='red') 
  + xlab("Group") 
  + ylab("Data")
 )

Which results in a graph like this:

enter image description here

schande
  • 576
  • 12
  • 27

3 Answers3

3

I assume you have the names of your .csv-files stored in a vector named file_names. Then you can run the following code and should get a different line for each file:

library(ggplot2)
data_list <- lapply(file_names, read.csv , header=FALSE, sep=",")

data_list <- lapply(seq_along(data_list), function(i){
  df <- data_list[[i]]
  df$Group <- round(df$Range, 1)
  df$DataNumber <- i
  df
  })

finalTable <- do.call(rbind, data_list)
finalTable$DataNumber <- factor(finalTable$DataNumber)

ggplot(finalTable, aes(x=Group, y=Data, group = DataNumber, color = DataNumber)) + 
  stat_summary(fun.y = mean, geom = "line") + 
  xlab("Group") + 
  ylab("Data")

How it works
First the different datasets are read with read.csv into a list data_list. Then each data.frame in that list is assigned a Group. I used round here with k=1, which means it rounds to one decimal point (I figured that's what your are doing).
Then also a unique number (in this case simply the index of the list) is assigned to each data.frame. After that the list is combined to one data.frame with rbind and then DataNumber is turned into a factor (prettier for plotting). Finally I added DataNumber as a group and color variable to the plot.

kath
  • 7,624
  • 17
  • 32
  • Thanks for the fast response. I appreciate your explanation below your solution! Sadly I do not have my .csv-files stored in a vector. Would you be so kind to tell me how to save them in one? – schande May 10 '18 at 13:06
  • 1
    You need the names of the csv - files in a vector. For example: `file_names <- c("my01data.csv", "my02data.csv", "my03data.csv")` – kath May 10 '18 at 13:09
  • I think that I have not fully understood the lapply() function, because I wanted to use it to add the column names to every file. data_list <- lapply(data_list, colnames(data_list) <-c("ID", "Data", "Range")) – schande May 10 '18 at 13:41
  • 1
    `data_list <- lapply(data_list, function(df) colnames(df) <-c("ID", "Data", "Range")) ` should work. See also this [SO-answer](https://stackoverflow.com/a/7141669/5892059) for a great introduction to the lapply-family – kath May 10 '18 at 13:43
  • Thank you again! Now I get the following error: Error in df$Range : $ operator is invalid for atomic vectors – schande May 10 '18 at 13:58
  • When adding the different files, is it okay to add the absolute path instead of just the name, like: file_names <- c("~Files/Results/01data.csv", "~Files/Results/02data.csv") – schande May 10 '18 at 14:06
  • 1
    Sorry, my bad! You have to add that you want to have df as a return value so data_list <- lapply(data_list, function(df){ colnames(df) <-c("ID", "Data"); df}) – kath May 10 '18 at 14:06
  • 1
    Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/170789/discussion-between-kath-and-schande). – kath May 10 '18 at 14:06
0

You can add another line by using stat_summary again; you can define the data and aes argument to any other dataset:

#some pseudo data for testing
my_other_data <- myData 
my_other_data$Data <- my_other_data$Data * 0.5 

pplot <- ggplot(data=myData, aes(x=Group, y=Data)) + 
    stat_summary(fun.y = mean, geom = "line", color='red') +
    stat_summary(data=my_other_data, aes(x=Group, y=Data), 
           fun.y = mean, geom = "line", color='green') +
    xlab("Group") +
    ylab("Data")
pplot
Octopus
  • 146
  • 1
  • 9
0

Why not creating a classifying column ("Class")

myTable1$Class <- "table1"

myTable1 
   "ID Data Range Group Class
    1     63   5.01   5.00    table1
    2     61   5.02   5.00    table1
    3     65   5.00   5.00    table1"

myTable2$Class <- "table2"

myTable2 
    "ID Data Range Group Class
    1     63   5.01   5.00    table2
    2     61   5.02   5.00    table2
    3     65   5.00   5.00    table2" 

And merging dataframe

dfBIND <- rbind(myTable1, MyTable2)

So that you can ggplot with a grouping or coloring variable

pplot <- ggplot(data=dfBIND, aes(x= dfBIND$Group, y= dfBIND$Data, group=Class)) +
stat_summary(fun.y = mean, geom = "line", color='red') +
xlab("Group") +
ylab("Data")
TiFr3D
  • 459
  • 1
  • 4
  • 11