-3

Lets say I have the following dataset 1

I want to make boxplots for each water logger values on the same graph. Everywhere I checked people have a factor variable to use. However, I don't want factors, I have the water logger number as the column name. I can do this with generic boxplot command : boxplot(data$colname1, data$colname2, data$colname3, and so on) but how can I do this with better graphics, like in ggplot2.

coffeinjunky
  • 11,254
  • 39
  • 57
Sate Ahmad
  • 27
  • 8

2 Answers2

2

Without actual data, it is difficult to show you the exact code you need to use, but after having a glimpse at that png, I would suggest you try something along the following lines:

library(reshape2)
library(ggplot2)

df <- melt(your_data)
ggplot(df, aes(x=variable, y=value)) + geom_boxplot()

This code probably needs some adjustment. If it doesn't work and the adjustments are not obvious, please post some example data in a way that makes it easy for us to use it. Data from a screenshot would imply we have to manually copy-paste each and every number, which few would be willing to do.

To clarify the general procedure: melt "stacks" all your columns on top of each other and adds a variable called variable, which refers to the old column name. You can hand this over to ggplot and say that the different values of variable should be on the x axis, which is what you want. For instance, have a look at women:

head(women)
  height weight
1     58    115
2     59    117
3     60    120
4     61    123
5     62    126
6     63    129

str(women)
'data.frame':   15 obs. of  2 variables:
 $ height: num  58 59 60 61 62 63 64 65 66 67 ...
 $ weight: num  115 117 120 123 126 129 132 135 139 142 ...

You see that women is a dataframe with 15 observations and two columns, height and weight.

Now, let's melt them:

df <- melt(women)

head(df)
  variable value
1   height    58
2   height    59
3   height    60
4   height    61
5   height    62
6   height    63

str(df)
'data.frame':   30 obs. of  2 variables:
 $ variable: Factor w/ 2 levels "height","weight": 1 1 1 1 1 1 1 1 1 1 ...
 $ value   : num  58 59 60 61 62 63 64 65 66 67 ...

Now you see it has 30 observations, and two columns: variable and value. variable identifies the old columns.

Let's hand this over to ggplot:

ggplot(df, aes(x=variable, y=value)) + geom_boxplot()

yields:

enter image description here

Here you have boxplots for both columns in the original women dataset.

coffeinjunky
  • 11,254
  • 39
  • 57
  • Thanks a lot. Sorry I am new at Stackoverflow, I did not understand how to upload a dummy dataset. I tried the melt procedure, but something is wrong. I used time variable (date time) as the id.variable (and I have 42000 observations). The id variable did not replicate after 42000, and shows NA. (can you tell me how to upload a dummy dataset) – Sate Ahmad Mar 07 '16 at 12:52
  • See e.g. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example , in particular `dput` might be helpful. – coffeinjunky Mar 07 '16 at 12:58
  • Nevermind. Actually I could finally do it. I think the melt function could not understand the POSIXlt class of my date/time. But after converting data/time back to factor class, it was able to melt the data into a long form. Thanks a lot! This was surely helpful. – Sate Ahmad Mar 07 '16 at 13:03
0

Here's another answer based on the same principle as coffeinjunky's, but more specific to your data set. Since you didn't provide the data set, I created a dummy data set with similar column names:

d <- data.frame(x=rep(0,8))
d$`Logger 1_Water_Level` <- c(1,2,3,4,5,3,4,5)
d$`Logger 2_Water_Level` <- c(7,9,2,6,8,9,2,3)

You need to reshape the data set so that you get a factor variable that identifies the loggers. Assuming that you have two loggers and that the data from the loggers are stored in columns 2 and 3, you can use the following code to go from the wide format that your data is stored in (i.e. separate columns for each logger) to the long format that you need for plotting with ggplot2 (i.e. single column for water level measurements, each logger is identified by a number in a column called Logger)

d_long <- reshape(d, varying=2:3, direction="long", timevar="Logger",v.names="Water_Level", times=1:2)
d_long$Logger <- as.factor(d_long$Logger)

And now you can plot the measurements using ggplot2:

p <- ggplot(d_long, aes(x=Logger, y=Water_Level))
p <- p + geom_boxplot()
p
m.soskuthy
  • 26
  • 3
  • Thanks a lot for your time. I am sure this would work. But the melt option shown by coffeinjunky is much easier for me personally. – Sate Ahmad Mar 07 '16 at 14:02