0

I have a monthly data on sales volumes:

YearMonth     Sales Count
2010-04       300
2010-05       342
2010-06       425

and I just want to draw a line graph in r to observe the trend.

I use ggplot2 in r:

ggplot(data,
   aes(x = YearMonth, y = `Sales Count`)) +
   geom_line()

However, r gives me an error message:

geom_path: Each group consists of only one observation. 
Do you need to adjust the group aesthetic?

I tried many ways to convert variable "YearMonth" to a numeric variable, but they all don't work...

Because the data was generated in python, I checked the data type using:

data.dtypes

and it returns

YearMonth           object
Sales Count         int64
dtype: object

I tried to convert it using

data['YearMonth'] = pd.to_datetime(data['YearMonth'])

but it converts everything to the first day of the month, i.e. the data now looks like:

YearMonth        Sales Count
2010-04-01       300
2010-05-01       342
2010-06-01       425

Because the x-axis should be each month rather than the first day of each month, is there anyway to keep just the month and plot it as a numeric or datetime variable?

Many thanks!!

EDITS

Actually when I plot it in r, it only shows years like 2010, 2011 on the x-axis... So the issue above does not matter, if we can change what is shown on the x-axis. Is there a way to define what can be shown on the x-axis, like showing 2010 April, 2010 May, rather than just the year?

SOLUTION

Combining answers from @Jon Spring and @ThomasPepperz, the following codes give me exactly what I want:

data[['YearMonth']] = lubridate::ymd(paste(data[['YearMonth']], 1))

ggplot(stats8, aes(YearMonth, `Sales Count`)) + 
  geom_line() +
  scale_x_date(date_breaks = "6 months",
               date_labels = "%Y %b") +
  theme(axis.text.x = element_text(angle=90, hjust=1))
IceAloe
  • 519
  • 2
  • 12
  • add `group = 1`, `ggplot(df,aes(x = YearMonth, y = SalesCount, group =1)) + geom_line()` If you google your error message it should lead you the marked post. – Ronak Shah Mar 20 '19 at 02:03
  • For me it displays what you have in `YearMonth` column on X-axis which is `2010-04`, `2010-05`. What do you need ? – Ronak Shah Mar 20 '19 at 02:14
  • @RonakShah, thanks a lot for the fast reply! group=1 really works for only a few points. My problem with it is I have too many observations, and the x-axis have all the values overlap with each other... I know we can add a date or time break if the x-value is stored as datetime. But since here it is stored as character, can we still only display every a few months, rather than display all the months on x-axis? – IceAloe Mar 20 '19 at 02:17
  • we can aggregate the data and then plot. Can you post few more rows with some overlap dates and explain what you would like in the output. I'll reopen the post. – Ronak Shah Mar 20 '19 at 02:20
  • Oh sorry for the confusion, what I mean by overlap is that there are too many labels on the axis so I can't see which month corresponds to which data, like in this post: https://stackoverflow.com/questions/37080756/too-many-values-on-x But the method in that post doesn't work for me, because my data is character... – IceAloe Mar 20 '19 at 02:33

2 Answers2

1
data$date = lubridate::ymd(paste(data$YearMonth, 1))

library(ggplot2)
ggplot(data, aes(date, Sales_Count)) + 
  geom_line() +
  scale_x_date(date_breaks = "month",
               date_labels = "%Y %b")

enter image description here

Jon Spring
  • 55,165
  • 4
  • 35
  • 53
  • Thanks a lot!! This works perfectly for me - the x-axis is now very clear after I change the date_breaks to 6 months. May I ask, (1) what does "paste" in the ymd function do, and (2) what does "%b" mean in date_lables? Thank you so much for your patience! – IceAloe Mar 20 '19 at 13:54
  • `paste` is a base R function that concatenates strings; lubridate's `ymd` function expects a year then month then day. R has a variety of codes for date formatting, for instance %y is YY and %Y is YYYY. %b happens to be abbreviated month: https://www.stat.berkeley.edu/~s133/dates.html – Jon Spring Mar 20 '19 at 16:58
  • Thank you so much for the detailed explanation and references, @Jon Spring! This really solves my problem, and teaches me how it is solved. Sorry for another question - I am trying to learn r and am reading the document on `paste`: paste (..., sep = " ", collapse = NULL). It says the second argument is how we want to separate the data. Does `1` in your code mean we don't need to separate the data? – IceAloe Mar 20 '19 at 18:00
  • No, the 1 is getting pasted on afterwards, with default `sep`aration of a space, so that "2010-04" is transformed into "2010-04 1". Lubridate is pretty robust and will treat spaces, dashes, slashes, etc. as separators between terms, so it should consistently turn that into the right date. – Jon Spring Mar 20 '19 at 18:05
  • Ohh this makes a lot of sense! Thank you so much for all the detailed explanations, Jon Spring! – IceAloe Mar 21 '19 at 00:18
0

Try:

df$YearMonth = lubridate::as_date(as.character(df$YearMonth), '%Y-%m')
df$month = lubridate::month(df$YearMonth)

Use 'lubridate' to convert to a date object and then use month() to extract only the month and store it as a new variable.

ThomasPepperz
  • 176
  • 11
  • Thanks a lot for the suggestion! It makes a lot of sense, but there are two problems with it: (1) lubricate works weirdly here... It converts 2004-06 to 2020-04-06 and I get an error message: Warning message: 72 failed to parse. (2) The second line only gives me the month, for example, it gives 6 rather than 2014-06, which is what I need to put on the x-axis. The reason is, there are also data from 2015-06, 2017-06, etc. If I only extract the month, I can't distinguish between the years... – IceAloe Mar 20 '19 at 02:49
  • If you're data is of the format "2010-04" it shouldn't fail to parse. Ensure that you're data is as originally posted and not as 'YYYY-MM-DD' – ThomasPepperz Mar 20 '19 at 02:52
  • Try turning the x-axis labels sideways to make more room if you need to. ```ggplot(...)+...+ theme(axis.text.x = element_text(angle=60, hjust=1))``` Also, ```lubridate::months(df, format = " ", labels = TRUE)``` will produce abbreviated month labels. – ThomasPepperz Mar 20 '19 at 02:59
  • This helps a lot and I have been looking for a way to turn the x-axis labels to an angle. Thank you so much!! – IceAloe Mar 20 '19 at 13:56
  • Regarding the abbreviation, I got an error message... Error: 'months' is not an exported object from 'namespace:lubridate' – IceAloe Mar 20 '19 at 13:58