1

There is something I don't understand. I've this data frame :

    Var1        Freq
1   2008-05     1
2   2008-07     7
3   2008-08     5
4   2008-09     3

I need to append a row on second position, for exemple it would be :

2008-06     0

I followed this (Add a new row in specific place in a dataframe). First step : add an index column ; second step : append rows with an index number for each ; then, sort it.

df$ind <- seq_len(nrow(df))
df <- rbind(df,data.frame(Var1 = "2008-06", Freq = "0",ind=1.1))
df <- df[order(df$ind),]

Ok, everything seems good. Even if I don't know why a column called "row.names" has appeared, I get :

    row.names   Var1       Freq   ind
 1      1       2008-05     1      1 
 2      5       2008-06     0      1.1
 3      2       2008-07     7      2
 4      3       2008-08     5      3
 5      4       2008-09     3      4

Now, I plot it, with ggplot2.

ggplot(df, aes(y = Freq, x = Var1)) + geom_bar()

Here we are. On the X axis, "2008-06" is placed at the end, after "2008-09" (ie with the index 5). In clear, the data frame has not been sorted, in despite of it seems to be.

Where I'm wrong ? Thanks for help...

Community
  • 1
  • 1
jonathan
  • 149
  • 4
  • 11

2 Answers2

3

Try this:

df$Var1 <- factor(df$Var1, df$Var1[order(df$ind)])

If you want ggplot2 to order labels, you have to specify the ordering yourself.

You might also want to look into converting Var1 to some sort of date class, then dispensing with the index variable altogether. This would makes things clearer, I think. The zoo package actually has a nice class for representing months of a given year, and you could use this for Var1. For example:

library(zoo)
df$Var1 <- as.yearmon(df$Var1)
df <- rbind(df,data.frame(Var1 = as.yearmon("2008-06"), Freq = "0"))

Now you can just order your data frame by Var1 without having to worry about keeping an index:

> df[order(df$Var1), ]
      Var1 Freq
1 May 2008    1
5 Jun 2008    0
2 Jul 2008    7
3 Aug 2008    5
4 Sep 2008    3

A plot in ggplot2 will turn out as expected:

ggplot(df, aes(as.Date(Var1), Freq)) + geom_bar(stat="identity")

The resulting plot.

Though you do have to convert Var1 to Date, since ggplot2 doesn't understand yearmon objects.

Peyton
  • 7,266
  • 2
  • 29
  • 29
  • In my experience, if you pass factors to ggplot2 and care about their order, it's better to create an ordered factor: `factor(df$Var1, df$Var1[order(df$ind)], ordered=TRUE)`. However, I agree with using Date class here. – Roland Jun 03 '13 at 18:35
  • For sorting by date, I've tested : `df[order(as.Date(df$Var1, format="%Y-%m")),]` which print my data frame, without any errors, but without any sorting... – jonathan Jun 04 '13 at 09:27
  • I don't reaaly understand what this command do, but it works ! What do you mean with ggplot ? I'm lost... :/ – jonathan Jun 04 '13 at 10:24
  • I updated my answer to provide an example of using dates. Jonathan, that should make things a bit clearer. – Peyton Jun 05 '13 at 02:56
1

It is because somewhere along the way you got a factor in the mix. This produces what you're after (without the rownames column):

df <- read.table(text="    Var1        Freq
1   2008-05     1
2   2008-07     7
3   2008-08     5
4   2008-09     3", header=TRUE, stringsAsFactors = FALSE)

df$ind <- seq_len(nrow(df))
df <- rbind(df,data.frame(Var1 = "2008-06", Freq = "0",ind=1.1, stringsAsFactors = FALSE))
df <- df[order(df$ind),]

ggplot(df, aes(y = Freq, x = Var1)) + geom_bar()

Notice the stringsAsFactors = FALSE?

As far as the order goes if you already have factors (as you do) you need to reorder the factor. If you want more detailed info see this post

Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519