0

I am using an online ONS dataset of inflation and trying to chart it, but when plotting it with ggplot the x-axis is not in chronological order. (the order is random)

Here is my code, with the link to the dataset:

install.packages("tidyverse")
library("tidyverse")
install.packages("lubridate")
library("lubridate")

#webscraping the ONS inflation csv file
cpi<-read.csv(url("https://www.ons.gov.uk/generator?format=csv&uri=/economy/inflationandpriceindices/timeseries/d7g7/mm23"))

#removing rows 1 to 7 which contain descriptors, keeping this as a dataframe
cpi<-cpi[-c(1,2,3,4,5,6,7),,drop=FALSE]

#renaming columns as date and inflation
cpi<- cpi %>% rename(date=Title)
cpi<- cpi %>% rename(inflation=CPI.ANNUAL.RATE.00..ALL.ITEMS.2015.100)

#proper title characters for date
cut_cpi$date<- str_to_title(cut_cpi$date)

#subsetting cpi dataset in order to have only the data from the row of 2020 JAN to the last row
cut_cpi<- cpi[(which(cpi$date=="2020 JAN")):nrow(cpi),]

#plotting inflation in a line chart
ggplot(cut_cpi,aes(x=date,y=inflation,group=1,))+geom_line(colour="black")+labs(title="CPI inflation from January 2020") +theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

I think the problem might have to do with dates as that is a character rather than a date. But I cannot turn that into date class.

I tried with this

cut_cpi$date <- as_factor(cut_cpi$date)
cut_cpi$date <- as_date(cut_cpi$date, format='%Y %b')

I tried checking the locale and it is not a problem

> Sys.setlocale("LC_TIME")
[1] "English_United Kingdom.1252"
r2evans
  • 141,215
  • 6
  • 77
  • 149
S12
  • 27
  • 4
  • In Stack questions/answers, code blocks using "code fences" (usually triple backticks `\`\`\``) must put the fences on a line by themselves with no code with them, as in `\`\`\`\n`. The first code-fence for each block _may_ optionally include a language-hint, such as `\`\`\`r`. The hint itself is not shown, it is used for coloring the code. If there is real code after the backticks, it is interpreted as a language (ergo not show); if there is real code before the backticks, the backticks do not end the block. See https://stackoverflow.com/editing-help and https://meta.stackexchange.com/a/22189 – r2evans May 13 '22 at 13:49
  • Your x-axis is categorical, you need to convert to `Date`-class, and then use `scale_x_date` to format it (see its `date_breaks=` and `date_labels=` arguments). Relevant: https://stackoverflow.com/q/9322923/3358272, https://stackoverflow.com/q/6242955/3358272, https://stackoverflow.com/q/56557922/3358272, https://stackoverflow.com/q/65647998/3358272, https://stackoverflow.com/q/66606315/3358272 – r2evans May 13 '22 at 13:56
  • The component of this question *not* a duplicate of those five links is fixed by using `cpi$inflation <- as.numeric(cpi$inflation)`. – r2evans May 13 '22 at 13:58

2 Answers2

1

You had two issues.

1- inflation was stored as character not a number so it couldn't be plotted

2- date was stored as a character, not a date, so it would just be plotted in alphabetical order. It has to be a date so it can be sorted properly, then just format the scale so that it prints the date in the format that you want.

library("tidyverse")
library("lubridate")

#webscraping the ONS inflation csv file
cpi<-read.csv(url("https://www.ons.gov.uk/generator?format=csv&uri=/economy/inflationandpriceindices/timeseries/d7g7/mm23"))

#removing rows 1 to 7 which contain descriptors, keeping this as a dataframe
cpi<-cpi[-c(1,2,3,4,5,6,7),,drop=FALSE]

#renaming columns as date and inflation
cpi<- cpi %>% rename(date=Title)
cpi<- cpi %>% rename(inflation=CPI.ANNUAL.RATE.00..ALL.ITEMS.2015.100)
#proper title characters for date

#THIS FAILS. cut_cpi data.frame hasn't been created yet so this doesn't work. Unnecessary so just remove it.
#cut_cpi$date<- str_to_title(cut_cpi$date)

#subsetting cpi dataset in order to have only the data from the row of 2020 JAN to the last row
cut_cpi<- cpi[(which(cpi$date=="2020 JAN")):nrow(cpi),]

#NEW
cut_cpi<- cut_cpi %>%
  mutate(real_date_format= parse_date_time(cut_cpi$date, orders = "%Y %b")) %>%
  arrange(desc(real_date_format))

#plotting inflation in a line chart

#NEW
# remove extra comma on aes
# converted inflation to numeric (was character)
# converted real_date_format to date (was datetime). scale_x_date breaks with datetime
ggplot(cut_cpi,aes(x=as_date(real_date_format), y=as.numeric(inflation),group=1))+
  geom_line(colour="black")+
  labs(title="CPI inflation from January 2020") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
#NEW
  scale_x_date(date_breaks = "1 month", date_labels =  "%b %Y")
Roger-123
  • 2,232
  • 1
  • 13
  • 33
1

you can try this :

ggplot(cut_cpi,aes(x=ym(date),y=inflation,group=1,))+geom_line(colour="black")+labs(title="CPI inflation from January 2020") +theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
  scale_x_date(date_breaks = "3 month")

you can change the "3 month" by whatever you want.

Claire