1

Doing a very simple code in ggplot2 where I have a large df of two columns one showing dates and another percent.

#snippet of df this, goes on for 5,000+ rows

      date        percent
1     1997-04-15  0.78
2     1997-04-16  0.77
3     1997-04-17  0.77
4     1997-04-18  0.77
5     1997-04-21  0.77

# also the dput() of the df not sure if I did this right
structure(list(date = structure(c(9966, 9967, 9968, 9969, 9972, 
9973, 9974, 9975, 9976, 9979, 9980, 9981, 9982, 9983, 9986), class = "Date"), 
percent = c("0.78", "0.77", "0.77", "0.77", "0.77", "0.79", 
"0.79", "0.79", "0.79", "0.79", "0.79", "0.79", "0.79", "0.79", 
"0.79")), .Names = c("date", "percent"), row.names = c(NA, 

15L), class = "data.frame")

Currently my ggplot() is something simple

ggplot( short_df, aes( date, percent ) ) + geom_line()

I try to plot a small snippet of the df to get a good idea of how the plot is going to look and I was greated by this:

enter image description here

When I do geom_point() the plot seems fine.

My second question is when plotting the entire df the plot seems to include every percent value:

img

I add scale_y_discrete( breaks = pretty( DF$percent ) ) to the previous code and when I use the short_df the plot seems to split the ticks fine:

img

However, when I do it on the actual df it shows me a y axis with one tick:

img

I do get a warning:

Warning message: In pretty.default(BSD$percent) : NAs introduced by coercion

14likd1
  • 25
  • 5
  • 1
    To make this [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), can you `dput` a representative sample of your data? My guess is that you have issues with types in your data (e.g. a factor column that should be numeric), and we'd need to see some of the actual data to know what's up – camille Sep 06 '18 at 16:19
  • just to add an explanation to @camille 's request - don't dput your full data frame. E.g. use `dput(head(your_df, 15)) ` 15 was just randomly chosen – tjebo Sep 06 '18 at 16:21
  • like dput it into stack overflow or in the code sorry I don't follow. Update: just added it not sure if I did it correct. – 14likd1 Sep 06 '18 at 16:27

1 Answers1

1

Your problem is that "percent" is of type character:

str(short_df)

'data.frame':   15 obs. of  2 variables:
 $ date   : Date, format: "1997-04-15" "1997-04-16" "1997-04-17" "1997-04-18" ...
 $ percent: chr  "0.78" "0.77" "0.77" "0.77" ...

As a result, ggplot treats "percent" as categorical, and does not connect lines between categories. Converting "percent" to numeric fixes the problem:

short_df$percent <- as.numeric(df$percent)

ggplot(short_df, aes( date, percent ) ) + geom_line()

enter image description here

Incidentally, the version of the plot that uses geom_point() is not correct. You can see that ggplot is plotting every unique value of "percent" (again, the behavior for character/categorical data types). With "percent" converted to numeric data, ggplot correctly calculates a series of well-spaced axis ticks.

jdobres
  • 11,339
  • 1
  • 17
  • 37