1

I have a csv data file I got from strava. The date format is in "Apr 21, 2020, 7:43:57 PM". I want to create a an ordered graph where each bar is a month.

But either the bars get smaller when using scale_x_discrete(limits=df_cleaned$Date)

enter image description here

or R reorders everything when I'm not using scale_x_discrete. The expected output is exactly that but without R reordering by alphabetical order or changing bar sizes.

enter image description here

I succeeded in created a date column i the format "Apr 2020" using

df_cleaned <- df %>%
    separate(Activity.Date, into = c("DayMonth", "Year", "Hour"), sep=",") %>%
    mutate(Year = substr(Year, 2, 5),
        Month = substr(DayMonth, 1, 3),
        Date = paste(Month, Year),...)

When I'm looking at df_cleaned$Date, the data is exactly in the order that I want. A solution proposed online is :

df_cleaned$Date <- factor(df_cleaned$Date, levels = df_cleaned$Date)

but it outputs

Error in 'levels<-'('*tmp*', value = as.character(levels)) : factor level [2] is duplicated

Another solution offered is using as.Date(df$Activity.Date, "%b %d, %Y") but all it does is transform everything into a bunch of NA.

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • 2
    Since `"Apr ..."` is a string, nothing will plot it as a number, you need to convert it (as you discuss. Without that, ggplot will sort the data alphabetically, which obviously does not work well when you get `Apr` before `Jan`. The way out of this is to correctly convert to `Date`- or `POSIXt`-class objects. Your `as.Date` expression works for me, `as.Date("Apr 21, 2020, 7:43:57 PM", format="%b %d, %Y")` produces `"2020-04-21"`. Does _that_ expression produce an `NA` for you? – r2evans Mar 17 '23 at 17:23
  • 1
    Please provide sample data, see https://stackoverflow.com/q/5963269. – r2evans Mar 17 '23 at 17:24
  • 2
    Once you get the dates correctly recognized, don't use `scale_x_discrete`, instead something like `scale_x_date(date_labels="%Y-%m")` (or `"%Y %b"`, over to you). – r2evans Mar 17 '23 at 17:26
  • In your `factor()` call, try replacing `levels = df_cleaned$Date)` with `levels = unique(df_cleaned$Date))`. – zephryl Mar 17 '23 at 17:26
  • 1
    You need to provide a [minimal, reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) of your data. – M-- Mar 17 '23 at 17:26
  • as.Date("Apr 21, 2020, 7:43:57 PM", format="%b %d, %Y") produces NA for me. I don't know how you are getting "2020-04-21" from it. @r2evans – Maxime F. Giguère Mar 17 '23 at 17:36
  • Likely duplicate [ggplot x-axis as date with hours](https://stackoverflow.com/questions/36460980/ggplot-x-axis-as-date-with-hours) – Ian Campbell Mar 17 '23 at 17:37
  • 1
    You might also find `lubridate::mdy_hms` helpful in parsing your date. – Ian Campbell Mar 17 '23 at 17:40
  • lubridate::mdy_hms works @IanCampbell But I still don't understand why as.Date("Apr 21, 2020, 7:43:57 PM", format="%b %d, %Y") produces NA – Maxime F. Giguère Mar 17 '23 at 17:45
  • 1
    "as.Date("Apr 21, 2020, 7:43:57 PM", format="%b %d, %Y") produces NA for me." Is it possible that your system is set for a non-english locale? `Sys.getlocale (category = "LC_ALL")` I wouldn't think that would cause an issue with date parsing but might be possible explanation. https://stat.ethz.ch/R-manual/R-devel/library/base/html/strptime.html or https://stat.ethz.ch/R-manual/R-devel/library/base/html/locales.html – Jon Spring Mar 17 '23 at 17:47
  • @MaximeF.Giguère You get NA, because the appropriate format is `as.Date("Apr 21, 2020, 7:43:57 PM",format = "%b %d, %Y, %I:%M:%S %p")`. At least in my local (`en_US.UTF-8`). – Ian Campbell Mar 17 '23 at 17:51
  • Everything is set in French Canada in Sys.getlocale (category = "LC_ALL"), which might explain why it didn't work. A combination of mutate(Date = as.Date(lubridate::mdy_hms(Activity.Date))) and scale_x_date(date_labels="%Y-%m") solved it for me where the as.Date("Apr 21, 2020, 7:43:57 PM", format="%b %d, %Y") didn't. Thanks to everyone who helped. – Maxime F. Giguère Mar 17 '23 at 17:56
  • @IanCampbell, both `"%b %d, %Y, %I:%M:%S %p"` and `"%b %d, %Y"` work (when locale is not a factor), where the default action of `as.Date` is to truncate the rest of the string outside of what is matched. – r2evans Mar 17 '23 at 18:08

0 Answers0