1

I have a set of data that looks like this,

species<-"ABC"
ind<-rep(1:4,each=24)
hour<-rep(seq(0,23,by=1),4)
depth<-runif(length(ind),1,50)

df<-data.frame(cbind(species,ind,hour,depth))
df$depth<-as.numeric(df$depth)

In this example, the column "ind" has more levels and they don't have always the same length (here each individual has 4 levels, but in reality some individuals have thousands of rows of data, while other only a few lines).

What I would like to do is to have an outer loop or function that will select all the rows from each individual ("ind") and generate a boxplot using the depth/hour columns.

This is the idea that I have in mind,

for (i in 1:length(unique(df$ind))){

  data<-df[df$ind==df$ind[i],]
  individual[i]<-data

  plot.boxplot<-function(data){
  boxplot(depth~hour,dat=data,xlab="Hour of day",ylab="Depth (m)")

  }

}

par(mfrow=c(2,2),mar=c(5,4,3,1))
plot.boxplot(individual)

I realized that this loop might be inappropriate, but I am still learning. I can do the boxplot for each individual at a time, but I would like a faster, more efficient way of selecting the data for each individual and creating or storing boxplot results. This will be very useful for when I have many more individuals (instead of doing one at a time...). Thanks a lot in advance.

user1626688
  • 1,583
  • 4
  • 18
  • 27

1 Answers1

2

What about something like this?

par(mfrow=c(2,2))
invisible(
  by(df,df$ind,
    function(x)
      boxplot(depth~hour,data=x,xlab="Hour of day",ylab="Depth (m)")
    )
)

To provide some explanation, this runs a boxplot for each group of cases in df defined by df$ind. The invisible wrapper just makes it so that the bunch of output used for the boxplot is not written to the console.

thelatemail
  • 91,185
  • 12
  • 128
  • 188
  • Wow! Thanks a lot, this is great! One quick question. If I want to include the name of each individual (for example, ind 1, ind 2, etc) as a title, what would be the best way to go. I tried including main=paste(df$ind) inside the boxplot function but is not given me a title... – user1626688 Dec 12 '12 at 03:36
  • @user1626688 - try something like this replacing the boxplot line inside the above function: `boxplot(depth~hour,data=x,xlab="Hour of day",ylab="Depth (m)",main=paste("ind=",x$ind[1],sep=""))` and you should be good to go. – thelatemail Dec 12 '12 at 04:42
  • Do you think you can select two columns when using the by function in the code above. For example, not only selecting each individual but also a column that match lets say the specific season for this individual, resulting in a boxplot of depth vs. hour of day by ind x season? – user1626688 Dec 12 '12 at 05:03
  • @user1626688 - you can access any variables for the group once inside the `by` function - see my answer here: http://stackoverflow.com/questions/13792951/analyze-by-row-groups-in-r/13793258#13793258 which will be a good guide I believe. – thelatemail Dec 12 '12 at 05:07
  • But only one variable at a time? What about if I want to access the information of two variables (individual AND season) to get the results, instead of just individual or just season? I like your other example, but I am not sure if it answers my question. – user1626688 Dec 12 '12 at 06:05
  • @user1626688 - `by` works like `by(dataframe,vars.to.split.on,function)` so your example would be `by(df,df[c("ind","season")],function)` – thelatemail Dec 12 '12 at 06:21