0

I am a relative beginner with R, so please forgive me if I make conceptual errors here.

I'm trying to plot a graph that measures the number of petitions that are certified or denied ("C" and "D") over time, from 1992 to 2019. The data set uses individual petitions that are dated YMD. The "C" and "D" are characters under the variable "Determ". The code I used is:

ggplot(data = TAA, mapping = aes(x = DetermDate, y = frequency(Determ), color = Determ)) + 
  geom_line() + 
  scale_x_date(date_breaks = "2 years", date_labels = "%Y") +
  theme_clean()

The resulting graph is: enter image description here.

Obviously, it's not very helpful. It shows that certifications and denials exist from 1992 to 2019, but that's about it. Again, I'm interested in the quantity of each over time. Any help at all would be greatly appreciated!

EDIT: Below is a copied head(TAA) from R.

 head(TAA)
  DetermDate           Company Name          City State   Zip    Workers                           Product Petitioner Determ EstNoWorkers
3 1992-03-06           Gleason Corp     Rochester    NY 14692 Production      Ctting and grinding machines    Workers      D           65
4 1992-02-28  Northwest Alloys, Inc          Addy    WA 99101 Production                   Metal magnesium    Workers      C          200
5 1992-03-06     Pan American World       Jamaica    NY 11430 Production                   Airline carrier    Workers      D         1100
6 1992-02-10     Potomac Sportswear   Martinsburg    WV 25401 Production                Childrens garments      Union      C           91
7 1992-02-18 Sage Drilling Co., Inc       Wichita    KS 67202 Production                 Oil, gas drilling    Workers      C           14
8 1992-02-18 Sage Drilling Co., Inc Oklahoma City    OK 73127 Production Oil, gas exploration and drilling    Workers      C           15

The data frame is mostly filler for the purposes of this question, with the focus being on "DetermDate" and "Determ". The row IDs on the side begin with "3" because I deleted NAs from a prior dirty data set. Thank you!

  • 1
    can you use the dput() function to show us what your data TAA looks like. It's hard to help otherwise – user438383 Mar 24 '20 at 20:14
  • Please make this a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Conor Neilson Mar 24 '20 at 20:15

1 Answers1

0

Your call of frequency() inside ggplot just returns a a constant 1 (try frequency(TAA$Determ)). Easiest way is to precalculate the frequency. Try this:

library(ggplot2)
library(dplyr)

TAA <- read.table(
text = '
id DetermDate           "Company Name"          City State   Zip    Workers                           Product Petitioner Determ EstNoWorkers
3 1992-03-06           "Gleason Corp"     Rochester    NY 14692 Production      "Ctting and grinding machines"    Workers      D           65
4 1992-02-28  "Northwest Alloys, Inc"          Addy    WA 99101 Production                   "Metal magnesium"    Workers      C          200
5 1992-03-06     "Pan American World"       Jamaica    NY 11430 Production                   "Airline carrier"    Workers      D         1100
6 1992-02-10     "Potomac Sportswear"   Martinsburg    WV 25401 Production                "Childrens garments"      Union      C           91
7 1992-02-18 "Sage Drilling Co., Inc"       Wichita    KS 67202 Production                 "Oil, gas drilling"    Workers      C           14
8 1992-02-18 "Sage Drilling Co., Inc" "Oklahoma City"    OK 73127 Production "Oil, gas exploration and drilling"    Workers      C           15
', header = TRUE)

# Prepare df for plotting and precalculate the count
TAA_plot <- TAA %>% 
  mutate(DetermDate = as.Date(DetermDate)) %>% 
  # Precalculate Frequency 
  count(DetermDate, Determ, name = "freq")
TAA_plot
#> # A tibble: 4 x 3
#>   DetermDate Determ  freq
#>   <date>     <fct>  <int>
#> 1 1992-02-10 C          1
#> 2 1992-02-18 C          2
#> 3 1992-02-28 C          1
#> 4 1992-03-06 D          2

ggplot(data = TAA_plot, mapping = aes(x = DetermDate, y = freq, color = Determ, group = Determ)) + 
  geom_line() + 
  scale_x_date(date_breaks = "2 years", date_labels = "%Y") 

  # Package??
  # + theme_clean()

Created on 2020-03-25 by the reprex package (v0.3.0)

stefan
  • 90,330
  • 6
  • 25
  • 51