-2

I have a data set that contains data on number of child deaths (cdeaths) listed by country and by year. How do I generate individual scatter plots (cdeaths by year) for each country without having to do it manually? I have 186 countries so I was wondering if there is a way to do this by factor.

Data set looks like this

country     year    cdeaths 
Afghanistan 2015    50.5  
Afghanistan 2014    45.2  
Afghanistan 2011    39.9  
Afghanistan 2011    38.6  
Afghanistan 2010    34.3  
Afghanistan 2008    24  
Afghanistan 2006    19  
Afghanistan 2003    14.3  
Afghanistan 2002    15  
Afghanistan 1999    15  
Barbados    2015    99  
Barbados    2014    99.7   
Barbados    2013    98.6  
Barbados    2012    98.9  
Barbados    2012    100  
Barbados    2011    100  
Barbados    2008    100  
Barbados    2007    100  
Barbados    2006    100  
Barbados    2005    100    
Barbados    2004    100  
Barbados    2003    100  
Barbados    2002    100  
Barbados    2000    98  
Barbados    1999    91  
Barbados    1995    100  
Cambodia    2014    89  
Cambodia    2011    71.7  
Cambodia    2010    71  
Cambodia    2009    63  
Cambodia    2008    58  
Cambodia    2005    43.8  
Cambodia    2004    16.3  
Cambodia    2000    31.8  
Cambodia    1998    34  
Cambodia    1995    43.3  
Denmark 2016    94.4  
Denmark 2015    95.4  
Denmark 2014    95.9  
Denmark 2013    96.3  
Denmark 2012    98.2    
Denmark 2011    98.5  
Denmark 2010    98.5  
Denmark 2009    98.7  
Denmark 2007    97.8  
Denmark 2006    98.7  
Denmark 2005    98.8  
Denmark 2004    98.6  
Denmark 2003    98.9  
Denmark 2002    98.8  
Denmark 2001    98.9  
Denmark 2000    98.8  
Denmark 1999    98.7  
Denmark 1998    98.8  
Denmark 1997    98.3  
Estonia 2016    99.4  
Estonia 2015    99.5  
Estonia 2014    99.4  
Estonia 2013    99.4    
Estonia 2012    99.3  
Estonia 2011    99.4  
Estonia 2010    99.3  
Estonia 2009    99.2    
Estonia 2008    99.3  
Estonia 2007    99.4    
Estonia 2006    99.5  
Estonia 2006    99.8    
Estonia 2005    99.6  
Estonia 2005    99.8  
Estonia 2004    99.7  
Estonia 2004    99.8  
Estonia 2003    99.4  
Estonia 2003    99.7  
Estonia 2002    99.5  
Estonia 2002    99.6  
Estonia 2001    99.6  
Estonia 2001    99.7  
Estonia 2000    99.5  
Estonia 2000    99.7  
Estonia 1999    99.5    
Estonia 1999    99.6  
Estonia 1998    99.5    
Estonia 1998    99.6  
Estonia 1997    99.3  
Estonia 1997    99.5  
Estonia 1996    99.4  
Estonia 1996    99.6  
Estonia 1995    99.3  
Estonia 1995    99.5  
Estonia 1994    99.1  
Estonia 1994    99.3  
Estonia 1993    99.1  
Estonia 1993    99.1  
Estonia 1992    98.9  
Estonia 1992    99  
Gabon   2012    89.3  
Gabon   2000    85.5  

A picture of the output that I want is attached.output desired

Stephan
  • 2,056
  • 1
  • 9
  • 20
Lisa
  • 3
  • 3
  • Please provide a sample of your data in a format that's easy for us to cut and paste into our R session to make this reproducible. Thanks :) – mysteRious Mar 29 '18 at 04:52
  • data set has 3 columns: country, year, cdeaths – Lisa Mar 29 '18 at 04:54
  • Still need a representative sample of the data, maybe with just a few countries. The image of the data frame is not useful because no one wants to type all your data back in. If you make it easy to start working on the problem lots of people will want to help :) Also, with your data, we can avoid problems like providing a solution for the wrong data type or data structure. – mysteRious Mar 29 '18 at 04:57
  • oh! i see! sorry i'm very new to stackoverflow and learning r just now. will provide sample asap – Lisa Mar 29 '18 at 04:59
  • No problem Lisa! I think most of us had the same initial struggle. Hang in there. – mysteRious Mar 29 '18 at 05:00
  • 1
    See also [How to make a great R reproducible example?](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Tung Mar 29 '18 at 05:01

1 Answers1

0

Here you go:

> library(dplyr)
> library(tibble)

> lisa <- read.table("D:/R/SO/lisa.txt",header=TRUE) # my path
> df <- as.tibble(lisa) 
> df
# A tibble: 97 x 3
   country      year   sab
   <fct>       <int> <dbl>
 1 Afghanistan  2015  50.5
 2 Afghanistan  2014  45.2
 3 Afghanistan  2011  39.9
...etc
# ... with 87 more rows

You can do what you're looking for with facet_wrap in ggplot2:

> df %>% group_by(country) %>% 
         ggplot(aes(x=year,y=sab)) + geom_point() + 
         facet_wrap(~country)

It produces this:

enter image description here

I haven't actually done this for a dataset with 100+ values of the categorical variable, so not sure how it will scale. There are lots of resources online that can help you tweak the ggplot2 parameters... just remember that it's like Photoshop... you always have to think in layers. Hope this helps!

mysteRious
  • 4,102
  • 2
  • 16
  • 36
  • hi! quick question. i already installed the package "dplyr", but there is still an error: could not find function "%>%" – Lisa Mar 29 '18 at 05:34
  • If that's happening, try installing `magrittr` - it comes over for free when you install and load `tidyverse` but I've noticed sometimes I have to load it up by itself. – mysteRious Mar 29 '18 at 05:42
  • what about group_by? which library is responsible for it? error message: group_by(., country) : could not find function "group_by" – Lisa Mar 29 '18 at 06:02
  • that's in `dplyr` which is also supposed to come with `tidyverse` – mysteRious Mar 29 '18 at 06:06
  • just make sure all variables after the pipe `%>%` match your variable names in `df` exactly. – mysteRious Mar 29 '18 at 06:07
  • i do have both dplyr and tidyverse already. would you have any other suggestions? so sorry for the trouble! – Lisa Mar 29 '18 at 06:15
  • yes, the variable names match. it's the function that i am having issues with. – Lisa Mar 29 '18 at 06:17
  • Hi Lisa, if you're still having issues, edit your original question and add the code you're running and the errors it's giving and we'll help get you through this :) I fell asleep after my last comment, so sorry for disappearing. – mysteRious Mar 29 '18 at 15:22