0

I have a problem that should be quite simple to fix but I haven't found any answers that are directly applicable to my situation.

I am trying to create a plot with geom_point in which the points shown are a filtered value from a character vector. y is a continuous numeric, x is a date and fill is a character vector.

Here's my sample data:

year    month   day attempt n_test
2019    6   22  1   NA
2019    7   13  2   n
2019    8   3   3   n
2019    8   20  4   n
2019    9   3   5   n
2019    9   4   6   n
2019    9   8   7   n
2019    9   11  8   p
2019    9   17  9   n
2019    10  3   10  n
2019    10  3   11  n
2019    10  11  12  c
2019    10  22  13  n
2019    10  25  14  n
2019    10  28  15  p
2019    11  6   16  c
2019    11  9   17  n
2019    11  25  18  n
2019    12  4   19  n
2019    12  8   20  n
2019    12  14  21  p
2019    12  17  22  n
2019    12  20  23  n

This is called 'ntest.csv'.

Here's my code:

ntest <- read.csv('ntest.csv', header = TRUE)
n_date <- ymd(paste(ntest$year, ntest$month, ntest$day, sep="-"))
ggplot(ntest, aes(n_date, y=attempt)) +
    geom_point(aes(colour = n_test), size = 3.5) +
    labs(x=NULL) +
    theme(legend.position="none",
          axis.text.x = element_text(color = "black", size = 10, angle=45),
          axis.text.y = element_text(color = "black", size = 10),
          axis.title.y = element_text(size = 13, vjust = 2)) +
    scale_x_date(date_breaks = "months" , date_labels = "%b-%y")

This gives the attached graph.

ntestplot

I want to only show the rows in my geom_point graph where n_test equals "p". So the same graph, with only the blue points. I've tried using

ntest %>% 
filter(n_test=="p")

before ggplot, but this results in:

"Error: Aesthetics must be either length 1 or the same as the data (3): x"

Any help would be greatly appreciated.

  • Hello :) In order for us to help you, please provide a [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) example. For example, to produce a minimal data set, you can use `head()`, `subset()`. Then use `dput()` to give us something that can be put in R immediately. Alternatively, you can use base R datasets such as `mtcars`, `iris`, *etc*. The problem is that, at the moment, we can't copy/paste your sample data and run your code. – Paul Sep 14 '20 at 13:44
  • 1
    @Paul, what about this is not reproducible? The only thing it is lacking is an explicit call to `library(lubridate)`, but otherwise it has everything needed for a reprex: usable sample data, code attempted, and the error message. (If you're having problem using the sample data as provided, then on win/lnx try `read.table("clipboard", header=TRUE)` after highlighting the 24 rows of sample data and copying to the clipboard; for macos, use `"pbcopy"` instead of `"clipboard"`.) – r2evans Sep 14 '20 at 13:50
  • (While I often make a distinction in similar comments about them needing to provide *unambiguous* data, often using `dput` or `data.frame`, that's suggested/necessary when: (1) there are embedded spaces in the data; (2) there are (or might/should be) `factor`s or `POSIXt` objects; or (3) numeric precision is in question. That does not appear to be a need here.) – r2evans Sep 14 '20 at 13:51
  • @r2evans Thanks for the tip with `read.table("clipboard", header=TRUE)`. I did not know about it and will use it in the future. Nevertheless, this tip is not mentioned in ["How to make a great R reproducible example"](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Or at least, it is not used this way. So yes, it is reproducible but not in a straight way. – Paul Sep 14 '20 at 14:17
  • It is there, https://stackoverflow.com/a/15929240/3358272. What is your expectation of "reproducible in a straight way"? Just `dput` and `data.frame` (for table-like structs)? – r2evans Sep 14 '20 at 14:24
  • @r2evans I saw this one but as you can see, this part is handled by the people asking the question, not the one answering. Your expectations in terms of reproducibility are not the same than mine. It's fine and I hope my 1st comment was as useful as yours with the tip about `read.table("clipboard", header=TRUE)` – Paul Sep 14 '20 at 14:32
  • 1
    Sure, but it works equally well for answerers. I agree that the format provided here is third in my list of preferred formats (first two are *programmatic* and `dput`), but it is popular, easy, and generally only a problem when one of my previous gotchas are present (spaces, class, precision). When none of those are a concern, this format is both easily imported by us, the answerers, and easily registers on the human eye, something that `dput` (as good as it is) fails miserably at. There are workarounds, certainly. Not meaning to debate you on this, thanks for the discussion! – r2evans Sep 14 '20 at 14:36

1 Answers1

2

The problem here is that you are not making n_date part of your data frame, so it is not filtered when you filter the data frame and is thus a different length from it. The solution is simply to create it as a column in your data frame:

ntest$n_date <- lubridate::ymd(paste(ntest$year, ntest$month, ntest$day, sep="-"))

Now you can apply your filter and your plot will work (note that the points are red because there is now only one colour group):

ggplot(ntest %>% filter(n_test == "p"), aes(n_date, y=attempt)) +
    geom_point(aes(colour = n_test), size = 3.5) +
    labs(x=NULL) +
    theme(legend.position="none",
          axis.text.x = element_text(color = "black", size = 10, angle=45),
          axis.text.y = element_text(color = "black", size = 10),
          axis.title.y = element_text(size = 13, vjust = 2)) +
    scale_x_date(date_breaks = "months" , date_labels = "%b-%y")

enter image description here

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • 1
    Dang it, I was so close to finishing my answer (with effectively the same code/recommendation.) – r2evans Sep 14 '20 at 13:48
  • @r2evans maybe it would be a nice feature if SO had a "I'm working on this" button you could click so other folks could see if someone trustworthy was working on answering a question. Might stop people from wasting time writing the same answers. It's happened to me so many times... – Allan Cameron Sep 14 '20 at 13:51
  • While I agree in concept ... you and I both know that often the "I'm working on this" button is really *"post an incomplete (even inaccurate) answer as quickly as possible, then edit into something more substantial"*. :-) I suspect that if that suggestion were made on meta.SO ... it would be downvoted rather rapidly (they're even more downvote-happy than here on SO). – r2evans Sep 14 '20 at 13:52
  • 1
    @r2evans too true. Sadly the "fastest gun in the West" type questions tend to get more upvotes than those questions which require a fuller answer with a carefully thought-out exposition. Like you, I prefer the challenging questions, but I'm still new enough to be drawn into the points game a bit too much. – Allan Cameron Sep 14 '20 at 14:00
  • Thanks so much Allan, and your code where the filter function is within the x argument of ggplot is especially useful, because now I can make my code more concise in many other graphs too! I have another question though, in the graph you've exported, the font looks more smooth than mine does. But the font style itself looks the same, mine is more pixelated. I've seen this in other people's graphs and tried to make mine smooth too. Is this because you're working in Linux while I'm working in Windows10? – nicholasflamel Sep 14 '20 at 14:22
  • @nicholasflamel I usually find that _my_ plots are pixelated compared to others here, so I don't know why it looks a bit less so in this answer. I'm using RStudio in Windows 7 on a 32-bit work PC at the moment, and the default plotting device doesn't provide anti-aliasing, so it looks quite pixelated depending on the image. You normally get better results by saving your plots as images or PDFs, or increasing the text size and making the plot bigger. – Allan Cameron Sep 14 '20 at 14:29