2

Solved!

Thank you @Gregor and @eipi10. I solved this problem by

  1. Converting the dataframe to long format:

    dat.long <-poverty %>% gather(key, value, -Year)

  2. Filter out all the NAs:

    dat.long<-dat.long[complete.cases(dat.long), ]

  3. ggplot!




Preface

Unlike other questions, I have a quite odd data frame with several NA values:

IMAGE 1

When I geom_line them:

ggplot(poverty, aes(Year))+
  geom_line(aes(y = poverty$Poverty..BM.2002.....of.world.population., colour = "2002 Poverty"))+
  geom_line(aes(y = poverty$Extreme.Poverty..BM.2002.....of.world.population., colour = "2002 Extreme Poverty")) +
  geom_line(aes(y = poverty$Less.than.1.90..per.day..World.Bank..2015......of.world.population., colour = "2015 Poverty"))

I get discontinued lines:

Discontinued lines

There are SO posts suggesting plot points first, then connect them by group. I thought it is a good idea when I got something like this:

Scatter plot

Problem

How do I connect the dots? (note, I don't have factor class column in my dataframe, so I can't really do it by group=)

Desired output:

Desired Output

Side Request

How do I change the legend title?

Thank you!

Xipu Li
  • 89
  • 1
  • 12
  • 3
    Change your data to long format, use a single geom_line and a single geom_point. **Never** put `data$` inside `aes()`, use bare column names.. – Gregor Thomas Dec 04 '17 at 04:35
  • That sounds like a good idea. But I cannot find `as.long`, (I'm new to R), would you help me with that? I did some research and found a package called `reshape2` can do that, but I don't know how to actualize that for my case. Thank you, Gregor! – Xipu Li Dec 04 '17 at 04:48
  • 1
    Long format example: `library(tidyr); dat.long = poverty %>% gather(key, value, -Year)`. This will stack all columns, except `Year`. There are lots of tutorials on `tidyr` and `gather`. [Here's one](https://rpubs.com/bradleyboehmke/data_wrangling). Then `ggplot(dat.long, aes(Year, value, colour=key)) + geom_line()`. – eipi10 Dec 04 '17 at 05:02
  • Thank you @eipi10! I did some clean-up that had ridden of the extra columns: `poverty2<- poverty[,3:6]`. then I tried your `gather()` method and restored it to a variable `long`. When plotted it, however, the lines are still discontinuous. I got a warning `Removed 30 rows containing missing values (geom_path).`. – Xipu Li Dec 04 '17 at 05:16
  • You probably have missing data in some columns, causing breaks in the plotted lines. What happens if you remove missing values: `dat.long = poverty %>% gather(key, value, -Year) %>% filter(!is.na(value))`, and then plot the data. – eipi10 Dec 04 '17 at 05:18
  • Yes, there are NAs in my data frame. The warning says `Error in filter(., !is.na(value)) : object 'value' not found`. Also, should I use `dat.long =` or `dat.long <-`? – Xipu Li Dec 04 '17 at 05:23
  • Never mind. I've sorted it out by using `dat.long<-dat.long[complete.cases(dat.long), ]`. Thank you all so much! – Xipu Li Dec 04 '17 at 05:32
  • `=` vs `<-` is just a matter of personal preference. – Gregor Thomas Dec 04 '17 at 15:19

0 Answers0