2

Forgive me if this question is self explanatory, but I am still trying to get to grips with some more of R's features.

I am currently trying to use R to replot a cumulative frequency with lines I plotted in excel.

Here is a picture of the graph I am trying to recreate

I think a lot of my problems are coming from having a lot of cells with no data, as I keep getting the warning:

Warning messages:
1: Removed 81 row(s) containing missing values (geom_path).
2: Removed 81 row(s) containing missing values (geom_path).
3: Removed 81 row(s) containing missing values (geom_path).

This is because each column represents a recording frequency witch witch only occurred for 21 days, with a 20 day rest period between each recording period.

My data table

I have tried using geom_ steps() and geom_points() but I end up with these:

Graphic produced using geom_step

graphic produced with geom_point

When I use the geom_line() function the axis are created but nothing is plotted.

Graphic produced using geom_line

The dates on the x axis also look horrendous, I tried using the code + theme(axis.text.x = element_text(angle = 90)) to rotate the labels but it still looks terrible, I am not sure if its just to many dates.

Here is the code I have been trying to get to work for the various geom functions:

ggplot() +
    geom_point(aes(x = Date, y = d2s1, group = 1), data = cf) +
    geom_point(aes(x = Date, y = d20s1, group = 1), data = cf) +
    geom_point(aes(x = Date, y = d10s1, group = 1), data = cf) +
    theme(axis.text.x = element_text(angle = 90))

ggplot() +
    geom_step(aes(x = Date, y = d2s1, group = 1), data = cf) +
    geom_step(aes(x = Date, y = d20s1, group = 1), data = cf) +
    geom_step(aes(x = Date, y = d10s1, group = 1), data = cf) +
    theme(axis.text.x = element_text(angle = 90))

ggplot() +
    geom_line(aes(x = Date, y = d2s1, group = 1), data = cf) +
    geom_line(aes(x = Date, y = d20s1, group = 1), data = cf) +
    geom_line(aes(x = Date, y = d10s1, group = 1), data = cf) +
    theme(axis.text.x = element_text(angle = 90))

I hope this all makes sense and thank you all in advance for any help you can provide!

I read in the data using read.csv("cf.csv").

I have attached the output of dput(cf) below.

structure(list(Date = c("08/11/2019", "09/11/2019", "10/11/2019", 
"11/11/2019", "12/11/2019", "13/11/2019", "14/11/2019", "15/11/2019", 
"16/11/2019", "17/11/2019", "18/11/2019", "19/11/2019", "20/11/2019", 
"21/11/2019", "22/11/2019", "23/11/2019", "24/11/2019", "25/11/2019", 
"26/11/2019", "27/11/2019", "28/11/2019", "29/11/2019", "30/11/2019", 
"01/12/2019", "02/12/2019", "03/12/2019", "04/12/2019", "05/12/2019", 
"06/12/2019", "07/12/2019", "08/12/2019", "09/12/2019", "10/12/2019", 
"11/12/2019", "12/12/2019", "13/12/2019", "14/12/2019", "15/12/2019", 
"16/12/2019", "17/12/2019", "18/12/2019", "19/12/2019", "20/12/2019", 
"21/12/2019", "22/12/2019", "23/12/2019", "24/12/2019", "25/12/2019", 
"26/12/2019", "27/12/2019", "28/12/2019", "29/12/2019", "30/12/2019", 
"31/12/2019", "01/01/2020", "02/01/2020", "03/01/2020", "04/01/2020", 
"05/01/2020", "06/01/2020", "07/01/2020", "08/01/2020", "09/01/2020", 
"10/01/2020", "11/01/2020", "12/01/2020", "13/01/2020", "14/01/2020", 
"15/01/2020", "16/01/2020", "17/01/2020", "18/01/2020", "19/01/2020", 
"20/01/2020", "21/01/2020", "22/01/2020", "23/01/2020", "24/01/2020", 
"25/01/2020", "26/01/2020", "27/01/2020", "28/01/2020", "29/01/2020", 
"30/01/2020", "31/01/2020", "01/02/2020", "02/02/2020", "03/02/2020", 
"04/02/2020", "05/02/2020", "06/02/2020", "07/02/2020", "08/02/2020", 
"09/02/2020", "10/02/2020", "11/02/2020", "12/02/2020", "13/02/2020", 
"14/02/2020", "15/02/2020", "16/02/2020", "17/02/2020"), d2s1 = c(6L, 
11L, 13L, 20L, 25L, 35L, 42L, 49L, 49L, 51L, 53L, 54L, 60L, 65L, 
69L, 73L, 76L, 80L, 85L, 86L, 86L, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), d10s2 = c(0L, 6L, 8L, 
10L, 11L, 14L, 14L, 15L, 18L, 19L, 21L, 21L, 22L, 22L, 24L, 24L, 
26L, 27L, 31L, 32L, 32L, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA), d20s1 = c(NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, 3L, 9L, 13L, 19L, 24L, 26L, 32L, 38L, 44L, 46L, 48L, 
50L, 56L, 62L, 64L, 64L, 73L, 83L, 92L, 99L, 105L, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA), d20s2 = c(NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, 0L, 2L, 2L, 3L, 4L, 14L, 15L, 23L, 25L, 27L, 36L, 37L, 38L, 
43L, 43L, 45L, 47L, 50L, 53L, 56L, 57L, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA), d10s1 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, 2L, 15L, 19L, 22L, 33L, 34L, 37L, 
37L, 39L, 41L, 48L, 50L, 52L, 56L, 62L, 64L, 65L, 68L, 72L, 77L, 
84L), d2s2 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, 4L, 4L, 4L, 4L, 4L, 7L, 9L, 9L, 12L, 12L, 
14L, 17L, 17L, 23L, 24L, 24L, 24L, 26L, 26L, 30L, 33L)), class = "data.frame", row.names = c(NA, 
-102L) 
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Tom Wright
  • 29
  • 2
  • 1
    Welcome to Stack Overflow! Help us help you: Provide a [mcve]. In particular, it will be easier to help with access to your data. You can [edit] your question and paste in the output of the R command `dput(cf)` to provide that data in a format that is easy for others to use in their R session. – duckmayr Jan 30 '21 at 10:40
  • 1
    Thank you very much for the help, I have added the output to the post :) – Tom Wright Jan 30 '21 at 11:52
  • For `geom-line` you need to add a `group` into `aes`. Also just want to second the point about converting x axis from factor to date. – Dan Adams Jan 30 '21 at 13:22

1 Answers1

0

The function geom_step() has an argument na.rm to remove NA values, which is FALSE by default. changing this to TRUE should give you the plots that you want. Alternatively you could change the NA data to zeroes for the same effect.

The crowded x-axis is typical of what happens when the data is stored as a factor, rather than a date. This will be related to how you read in your data, which you haven't shown.

Miff
  • 7,486
  • 20
  • 20
  • Thank you very much for getting back to me! That is really helpful I will give that a try. I read in my data with the `read.csv("cf.csv")` function. – Tom Wright Jan 30 '21 at 11:23