0

I have a list of data frames called: mylist. Headers of the list are names of people and the data frames contain columns with data associated with those names (date, height, weight, etc)

names(mylist[1])
[1] "John"
names(mylist[2])
[1] "Susan"


mylist[[1]]
[1] name  date hight weight ....
    John  1950 1.81  78
    John  1948 1.60  60
    John  1935 1.50  55

mylist[[2]]
[1] name  date hight weight ....
    Susan 1985 1.40  40    .
    Susan 1995 1.45  60
    Susan 1990 1.25  40

I want to create a boxplot for each of the metrics: one boxplot for height, one for weight, etc. And I want to include just in each metric's boxplot all people information. For example, I want a box plot for height that contains the info of John, Susan, etc.

Here is my attempt for the loop but it is not working.

for(s in 3:21) {
boxplot(x=for(i in 1:99){ mylist[[i]][s]}))
}

Hi guys, I applied nograpes solution. Although, the code that he suggested:

ggplot(melted.df,aes(x=name,y=value)) + 
geom_boxplot() + facet_grid(variable~.,scales='free')

stacks each boxplot above each other and finally the plot is unreadable as there are 16 boxplots. Thus, a good idea is to create 16 different boxplots, one for each metric.

I've been looking for solutions for this and one is to run this code:

tomelt<-data.frame(c(daily[1],daily[2],daily[3])) #create a data.frame with variable   name, date and the variable to be ploted. 

melted.df<-melt(tomelt,id.vars=c('name', 'date')) #convert to long form 
ggplot(melted.df,aes(x=name,y=value)) + geom_boxplot() #plot 

16 times, each time changing the code to call another metric(column) of the data frame... but obviously that this is not efficient at all.

Do you have any idea on how to create a for loop to do this?

Frank
  • 66,179
  • 8
  • 96
  • 180
user2794659
  • 145
  • 2
  • 14
  • It would be helpful if you gave us the output of `dput(mylist)` so we could just cut and paste your data into our sessions. – nograpes Sep 22 '13 at 02:17
  • Thank you nograpes! The problem is that the data is hughe! it has 16 metrics and more than 50.000 observations. – user2794659 Sep 22 '13 at 02:20
  • 1
    Well, there is a [nice FAQ](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) that shows you some techniques on how to reduce your data for questions. But, briefly, you can use `dput(head(mylist))` and it will give us only the first six in your list. – nograpes Sep 22 '13 at 02:23
  • @user2794659, have a look at http://bit.ly/SORepro Then use `reproduce(mylist)` – Ricardo Saporta Sep 22 '13 at 02:40
  • Error in reproduce(splitted) : could not find function "is.data.table" – user2794659 Sep 22 '13 at 02:53
  • Finally I solved it... we need the print() for ggplot to work inside a loop! – user2794659 Sep 22 '13 at 09:24
  • Do you see how you created a little sample for yourself called `tomelt` of the first few data ponits to allow you to test your code? That is exactly what we needed to help you. All you have to do is put `dput(tomelt)` – nograpes Sep 22 '13 at 12:26

2 Answers2

1

There are a lot of things you seem to be missing here. First, it appears that, if you have 16 metrics and 50,000 observations, with 10 observations per person, you will get 80,000 boxplots. Perhaps you were interested in only the first few people or something.

You have a lot to learn about for loops. You should definitely pick up a guide about R and try a few more basic things first.

With the for loop, I think you intended to do something like this:

for(s in 3:ncol(mylist[[1]])) { 
  for(i in 1:length(mylist)){ 
    boxplot(mylist[[i]][s])
  }
}

But even that wouldn't work, each plot would overwrite the last one, so you would have to put it into a grid with par(mfrow=c(num.rows,num.cols)). But there are much better options. You should look at ?boxplot, especially the examples; there are many examples that apply to your situation. Also, consider ggplot2. For example, try this code, which will plot your data neatly in rows and columns without for loops.

# Stick your list together.
one.df<-do.call(rbind,mylist)
# Convert to "long-form".
library(reshape)
melted.df<-melt(one.df,id.vars=c('name','date'))
# Plot with ggplot2
ggplot(melted.df,aes(x=name,y=value)) + 
geom_boxplot() + facet_grid(variable~.,scales='free')

enter image description here

nograpes
  • 18,623
  • 1
  • 44
  • 67
  • Hi, nograpes! Thank you for the answer! I tried the code for ggplot2 plot but it says "Error in layout_base(data, rows, drop = drop) : At least one layer must contain all variables used for facetting" – user2794659 Sep 22 '13 at 03:24
0

I have recently done this. I have a data frame, and I want to plot columns 3 to 10 as the y-axis in each of 8 boxplots against column 1. The df is called "Event". You have to make a new data frame ("boxdf") each time with the same colnames, which is lame, but it works:

for (g in 3:ncol(Event))
    {
        SaveBox = paste0("boxplot_",colnames(Event[g]),"_",g,".png")
        boxdf <- data.frame(Event$event_type,Event[g])
        colnames(boxdf) = c("event_type","ycol")
        boxplot(ycol~event_type,data=boxdf, main = colnames(Event[g]), xlab="type: 0:non-event, 1:event", ylab=colnames(Event[g]),col=c("blue","red")) #Event[g]
        dev.copy(png,SaveBox)
        dev.off()
        rm(boxdf)
    }

Oh, it also saves each boxplot into the current directory. The save lines are:

 SaveBox = paste0("boxplot_",colnames(Event[g]),"_",g,".png")
 ...
 dev.copy(png,SaveBox)
            dev.off()
Gecko17k
  • 43
  • 1
  • 8
  • Initially I tried to just refer to the y-axis as as.name(colnames(Event[g])) as in: boxplot(as.name(colnames(Event[g]))~event_type,data=Event...) but that didn't work. – Gecko17k Aug 06 '15 at 09:04