1

I have basically this very odd type of data frame:

The first column is the name of the States (say I have 3 states), the second to the last column (say I have 5 columns) contains some values recorded at different dates (not continuous). I want to create a graph that plots the values for each State on the range of the dates that starts from the earliest and end in the latest dates (continuous).

The table looks like this:

state 2020-01-01 2020-01-05 2020-01-06 2020-01-10
AZ NA 0.078 -0.06 NA
AK 0.09 NA NA 0.10
MS 0.19 0.21 NA 0.38

"NA" means there is not data.

How do I produce this graph in which the x axis is from 2020-01-01 to 2020-01-10 (continuous), the y axis contains the changing values (as points) of the three States, each state occupies its separate (segmented) y-axis?

Thank you.

  • It's more helpful to provide some of your data to make a [good reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). You can do this by providing just a little bit of your data via `dput(head(df))`. – AndrewGB Jun 16 '21 at 03:49

2 Answers2

1

You can get the data into a long format, which makes it easier to plot. R will make it difficult to read column names that start with a number. While reading the data, ensure that you have check.names = FALSE so that column names are read as is.

library(tidyverse)
  
df %>%
  pivot_longer(cols = -state, 
               values_drop_na = TRUE) %>%
  mutate(name = as.Date(name)) %>%
  ggplot() + aes(name, value, color = state) + geom_line()
AndrewGB
  • 16,126
  • 5
  • 18
  • 49
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

Here is a potential solution:

library(tidyverse)
# Create the example dataframe
dat1 <- tibble::tribble(
  ~state, ~`2020-01-01`, ~`2020-01-05`, ~`2020-01-06`, ~`2020-01-10`,
    "AZ",            NA,         0.078,         -0.06,            NA,
    "AK",          0.09,            NA,            NA,           0.1,
    "MS",          0.19,          0.21,            NA,          0.38
  )

# Transform and plot the data
dat1 %>% 
  pivot_longer(-c(state)) %>%
  mutate(dates = as.Date(name)) %>%
  ggplot(aes(x = dates, y = value)) +
  geom_point() +
  facet_grid(rows = vars(state)) +
  scale_x_date(date_breaks = "1 day", name = "") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1.05))

example_2.png

Edit

To specify the start/end dates on the x axis you can set limits, i.e.

dat1 %>% 
  pivot_longer(-c(state)) %>%
  mutate(dates = as.Date(name)) %>%
  ggplot(aes(x = dates, y = value)) +
  geom_point() +
  facet_grid(rows = vars(state)) +
  scale_x_date(date_breaks = "1 day", name = "", limits = c(as.Date("2019-12-25"), as.Date("2020-01-15"))) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1.05))

example_3.png

jared_mamrot
  • 22,354
  • 4
  • 21
  • 46
  • Thank you very much, Jared - this looks close to what I intend to produce. A few additional questions: 1. I want to make the horizontal axis contain all dates from 2020-01-01 to 2020-01-10 (continuous) so even the dates that there is not data can show up. 2. I do have more dates than 4 I and more states than 3 I listed in my questions, how do I produce the dataframe using your methods? In any cases, thank you very much for the quick answer. Very helpful. – OpenSource Guy Jun 16 '21 at 03:33
  • I've edited my question to suit. You can specify the dates on the x axis by changing the `date_breaks` parameter, i.e. at the moment it's `date_breaks = "1 day"` but you can make it `date_breaks = "1 week"` or `date_breaks = "1 month"` (whatever suits your purpose). I also changed the angle of the dates so that they wouldn't overlap, but you may not need that – jared_mamrot Jun 16 '21 at 03:44
  • Combine your solution with Ronak's solution, I am able to produce exactly what I need. Thank you both very much. – OpenSource Guy Jun 16 '21 at 03:47
  • No problem, please see [what to do when someone answers my question](https://stackoverflow.com/help/someone-answers) and consider accepting the answer if your problem is solved. – jared_mamrot Jun 16 '21 at 03:49
  • Yep - I edited my answer - the code is in the "# Transform and plot the data" section – jared_mamrot Jun 16 '21 at 03:50
  • Just saw that :). S – OpenSource Guy Jun 16 '21 at 03:51
  • By the way, what if I need to start the date from a specific date that is earlier than the earliest date in the data frame, and end the date with another specific date that is later than the latest date in the data frame? (can the starting and ending dates on the horizontal axis be specified?) Thank you again. – OpenSource Guy Jun 16 '21 at 03:58
  • You can specify limits; I'll edit my answer again to show you how – jared_mamrot Jun 16 '21 at 04:04