-1

I am subsetting a data from a file and then trying to plot a line in ggplot2. Only manages to get points (though I am using geom_point() + geom_line())

d1<-structure(list(year = structure(1:10, .Label = c("2001", "2002", 
"2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", 
"2011", "2012"), class = "factor"), val1 = c(42244L, 43161L, 
42444L, 43579L, 43424L, 45116L, 48003L, 48835L, 47856L, 50024L
), val2 = c(0L, 0L, 0L, 0L, 18L, 0L, 0L, 7L, 0L, 0L), val3 = c(109467L, 
112956L, 110623L, 125657L, 127560L, 137180L, 156412L, 164861L, 
174395L, 180413L), val4 = c(20381L, 18346L, 16636L, 18119L, 17173L, 
19234L, 22113L, 22624L, 23374L, 23280L), val5 = c(7056L, 6679L, 
6287L, 6261L, 7197L, 7581L, 10321L, 10535L, 10242L, 12080L), 
    val6 = c(12823L, 12056L, 11101L, 11428L, 12665L, 11783L, 
    9861L, 8250L, 7802L, 6775L), val7 = c(220L, 101L, 55L, 68L, 
    212L, 85L, 95L, 125L, 49L, 81L), val8 = c(694L, 2527L, 1066L, 
    1700L, 2976L, 1665L, 1229L, 1086L, 879L, 958L), val9 = c(12439L, 
    12698L, 15351L, 12771L, 13192L, 12420L, 13753L, 14943L, 14368L, 
    10404L), val10 = c(17819L, 18221L, 15643L, 19250L, 19326L, 
    20967L, 23658L, 27208L, 30526L, 34250L), val11 = c(20446L, 
    21236L, 19994L, 22489L, 23212L, 23792L, 25363L, 25036L, 25845L, 
    27074L), val12 = c(243589L, 247981L, 239200L, 261322L, 266955L, 
    279823L, 310808L, 323510L, 335336L, 345339L)), .Names = c("year", 
"val1", "val2", "val3", "val4", "val5", "val6", "val7", "val8", 
"val9", "val10", "val11", "val12"), 
row.names = c(NA, 10L), class = "data.frame")

and then I run

d2<-subset(d1[,c(1,2)]) #(here d1 is the main (csv)file)
ggplot(d2,aes(x=year,y=val1))+geom_line()+geom_point()
# geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?

same thing for qplot, when using geom="line", it is showing the same notice, but without using geom="line", it shows points without any note/error

qplot(y=val1,x=year,data=d2,geom="line")
# geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?

Now when I create a data frame manually as

d2<-data.frame(year=c(2001,2002,2003,2004,2005,2006,
2007,2008,2009,2010,2011,2012),
value=c(20446,21236,19994,22489,23212,23792,25363,
25036,25845,27074,28878,31117))

I am able to plot the line. Not able to figure out what is wrong. Thanks

MrFlick
  • 195,160
  • 17
  • 277
  • 295
Gaurav Chawla
  • 1,473
  • 3
  • 14
  • 19
  • 1
    You need to make a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). how are you subsetting the data frame?. Exactly what do your plotting commands look like? Ideal you should post a minimal example we can copy/paste into R and get the same error – MrFlick Sep 21 '14 at 03:13
  • Downvoted for posting error message with incomplete code (therefore unclear). – IRTFM Sep 21 '14 at 03:14
  • Done.Please tell if the ambiguity still exists. – Gaurav Chawla Sep 21 '14 at 03:38
  • Your example is not complete because we have no idea what's in your `d1` data.frame. How about sharing `dput(d1)` or at least `dput(head(d1, 10))` – MrFlick Sep 21 '14 at 03:45
  • 1
    @Gaurav Chawla It is not clear why you used `subset` command if you have already subsetted the dataset by `d1[,c(1,2)]` Check the `class` of the manually created `d2$year` and the one you subsetted. The former is `numeric` and the other is `factor` – akrun Sep 21 '14 at 04:03
  • I checked it, in both cases it is coming out to be factor. d2<-subset(d1[,c(1,2)]) > class(d2$year) [1] "factor" > d2<-d1[,c(1,2)] > class(d2$year) [1] "factor" – Gaurav Chawla Sep 21 '14 at 04:18
  • @Gaurav Chawla I was talking about the manually created `d2` and the subsetted `d2` – akrun Sep 21 '14 at 04:30
  • Thanks I got it now. – Gaurav Chawla Sep 21 '14 at 05:20

1 Answers1

1

For whatever reason, you have your years as factors in csv and they are numeric in your "manually created" one. Factors are used for categorical variables which tend to have different plotting rules than continuous variables.

You could do

ggplot(d2,aes(x=as.numeric(as.character(year)),y=val1))+geom_line()+geom_point()

to convert the year back into a number, but it would probably be better to figure out why it imported into R as a factor in the first place. Chances are you have bad data in there.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Thank you Sir for figuring out the reason behind this problem. May be there is problem in the data. For further knowledge may I ask why do this happen. I encountered this problem first time. Why the data is behaving abnormally/improperly when I have subsetted it and written in new file and read again? – Gaurav Chawla Sep 21 '14 at 05:21
  • 1
    That's because flat text files are not a good way to store data. They do not allow you to preserve metadata about datatypes. Every time you read them in at must guess the class of variable for each column. It's likely that when you subset your data you are removing the non-numeric values so when you read it in the next time R will think the year is a number – MrFlick Sep 21 '14 at 14:13
  • Sir now when using as.numeric, the values of years at x -axis, tends to be coming like this 2000, 2002.5, 2005, and so on. Not getting the original values of years at the x-axis i.e 2001,2002,2003, and so on. What should I do to simplify this? Should I post it as a separate question? – Gaurav Chawla Sep 21 '14 at 14:55