0

I have this data set:

##     fips      SCC Pollutant Emissions  type year
## 4  09001 10100401  PM25-PRI    15.714 POINT 1999
## 8  09001 10100404  PM25-PRI   234.178 POINT 1999
## 12 09001 10100501  PM25-PRI     0.128 POINT 1999
## 16 09001 10200401  PM25-PRI     2.036 POINT 1999
## 20 09001 10200504  PM25-PRI     0.388 POINT 1999
## 24 09001 10200602  PM25-PRI     1.490 POINT 1999


'data.frame':   2096 obs. of  6 variables:
 $ fips     : chr  "24510" "24510" "24510" "24510" ...
 $ SCC      : chr  "10100601" "10200601" "10200602" "30100699" ...
 $ Pollutant: chr  "PM25-PRI" "PM25-PRI" "PM25-PRI" "PM25-PRI" ...
 $ Emissions: int  6 78 0 10 10 83 6 28 24 40 ...
 $ type     : chr  "POINT" "POINT" "POINT" "POINT" ...
 $ year     : int  1999 1999 1999 1999 1999 1999 1999 1999 1999 1999 ...

fips: A five-digit number (represented as a string) indicating the U.S. county
SCC: The name of the source as indicated by a digit string (see source code classification table)
Pollutant: A string indicating the pollutant
Emissions: Amount of PM2.5 emitted, in tons
type: The type of source (point, non-point, on-road, or non-road)
year: The year of emissions recorded

I am trying to make a plot in ggplot to see if the emissions have increased or decreased along the years by the type of source; also I would like to add a linear model to show the trend.

This is what I've done so far:

GGplotGraph <- ggplot(PM25Baltimore, aes(x = year, y = Emissions, group = year, colour = type))

GGplotGraph <- GgplotGraph + geom_line() + facet_wrap(~ type) + theme(legend.position = "none")

GGplotGraph <- GgplotGraph + geom_smooth(method = "lm", formula = Emissions ~ year , se = FALSE, aes(group = 1)

This is the graph i get, but I would like the lines to be a continuous, from 1999 to 2008.

img

After doing some research on the topic,I understood that this is happening because the grouping is done wrong. I tried various combinations, i converted the type column to factor, but still, it did not work.

The other problem I have is with the linear model. I receive this error:

Error in model.frame.default(formula = formula, data = data, weights = weight,  : 
  variable lengths differ (found for '(weights)')
Error in if (nrow(layer_data) == 0) return() : argument is of length zero

I found here some explanations, but my skills regarding debug, traceback or recover are very limited.

I would like some advice on how to proceed or what to try next.

Glorfindel
  • 21,988
  • 13
  • 81
  • 109
alecsx
  • 126
  • 7

1 Answers1

3

First I created some test data since your example was a bit too short to reproduce

set.seed(18)
PM25Baltimore<-data.frame(
    type = rep(c("Non-Road","Nonpoint","on-road","point"), each=10*10),
    year = rep(1999:2008, 10*4),
    Emissions = runif(10*4*10, 0,500)
)

So i'm going to use stat_summary rather than group to collapse multiple observations for each type/year to use the mean value. I think the group=year was what was causing your "sawtooth" problem. That will give me the following plot

ggplot(PM25Baltimore, aes(year, Emissions, color=type)) + 
    facet_wrap(~ type) + theme(legend.position = "none") + 
    stat_summary(fun.y="mean", geom="line") + 
    geom_smooth(method="lm", se=FALSE, linetype=3, color="black")

sample plot with averaged y values and regression lines

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Thank you! It worked. I also think I was having problems plotting because of character variables. They should have been factor. Anyway, everything is fine now.:) – alecsx May 19 '14 at 16:26