0

I am trying to plot my dataset marriage, which consists of State, Year, and Rate. However I am trying to focus on specific year intervals, so that the graph looks less congested.

marriage<-read.csv(file="~/Desktop/datah.csv", header=T, sep=",",check.names=FALSE)
marriage
marriage <- marriage %>%
     gather(key=year, value=rate, `2017`:`1990`)
ggplot(marriage, aes(x=year, y=rate, group=State)) +
     geom_point(aes(color=State)) +
     geom_line(aes(color=State)) +
     theme_bw()

I have tried to add the following to try to limit the x and y axis to the last line of code above

+ylim(0,2)
+scale_x_continuous(limits=c(2000, 2005))
+xlim(2010, 2015)
+scale_x_continous(breaks = seq(2000, 2005, 5))

But I get this, Error: Discrete value supplied to continuous scale

I have also tried to turn it into a numeric

marriage$variable=as.numeric(levels(marriage$variable))[marriage$variable]

I get the following error Error in $<-.data.frame(tmp, variable, value = numeric(0)) : replacement has 0 rows, data has 1071

Here is the first few lines of the data

                  State 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 2000 1999 1995 1990
1               Alabama  7.0  7.1  7.4  7.8  7.8  8.2  8.4  8.2  8.3  8.6  8.9  9.2  9.2  9.4  9.6  9.9  9.4 10.1 10.8  9.8 10.6
2                Alaska  6.9  7.1  7.4  7.5  7.3  7.2  7.8  8.0  7.8  8.4  8.5  8.2  8.2  8.5  8.1  8.3  8.1  8.9  8.6  9.0 10.2
3               Arizona  5.8  5.9  5.9  5.8  5.4  5.6  5.7  5.9  5.6  6.0  6.4  6.5  6.6  6.7  6.5  6.7  7.6  7.5  8.2  8.8 10.0
4              Arkansas  9.5  9.9 10.0 10.1  9.8 10.9 10.4 10.8 10.7 10.6 12.0 12.4 12.9 13.4 13.4 14.3 14.3 15.4 14.8 14.4 15.3
5           California   6.3  6.5  6.2  6.4  6.5  6.0  5.8  5.8  5.8  6.7  6.2  6.3  6.4  6.4  6.1  6.2  6.5  5.8  6.4  6.3  7.9
6              Colorado  7.3  7.4  6.8  7.1  6.5  6.8  7.0  6.9  6.9  7.4  7.1  7.2  7.6  7.4  7.8    8  8.2  8.3  8.2  9.0  9.8
7           Connecticut  5.6  5.6  5.3  5.4    5  5.2  5.5  5.6  5.9  5.4  5.5  5.5  5.8  5.8  5.5  5.7  5.4  5.7  5.8  6.6  7.9
8              Delaware  5.5  5.6  5.7    6  6.6  5.8  5.2  5.2  5.4  5.5  5.7  5.9  5.9  6.1    6  6.4  6.5  6.5  6.7  7.3  8.4
9  District of Columbia  8.2  8.1  8.2 11.8 10.8  8.4  8.7  7.6  4.7  4.1  4.2    4  4.1  5.2  5.1  5.1  6.2  4.9  6.6  6.1  8.2
10              Florida  7.8  8.1  8.2  7.3    7  7.2  7.4  7.3  7.5  8.0  8.5  8.6  8.9  9.0    9  9.4  9.3  8.9  8.7  9.9 10.9
11              Georgia  6.9  6.8  6.2  ---  ---  6.5  6.6  7.3  6.6  6.0  6.8  7.3  7.0  7.9    7  6.5  6.1  6.8  7.8  8.4 10.3
12               Hawaii 15.3 15.6 15.9 17.7 16.3 17.5 17.6 17.6 17.2 19.1 20.8 21.9 22.6 22.6   22 20.8 19.6 20.6 18.9 15.7 16.4
13                Idaho  7.8  8.1  8.2  8.4  8.2  8.2  8.6  8.8  8.9  9.5 10.0 10.1 10.5 10.8 10.9   11 11.2 10.8 12.1 13.1 13.9
subrinarafiq
  • 59
  • 1
  • 8
  • Possible duplicate of [Plotting with ggplot2: "Error: Discrete value supplied to continuous scale" on categorical y-axis](https://stackoverflow.com/questions/29278153/plotting-with-ggplot2-error-discrete-value-supplied-to-continuous-scale-on-c) – MatthewR Nov 12 '19 at 23:32
  • It sounds like your `year` column is likely categorical after you `gather()`. The `convert` argument in `gather()` may help with this. – aosmith Nov 12 '19 at 23:33
  • @MatthewR I looked at that link and tried to implement it and got this error. `marriage$variable=as.numeric(levels(marriage$variable))[marriage$variable]` Error in $<-.data.frame(*tmp*, variable, value = numeric(0)) : replacement has 0 rows, data has 1071 – – subrinarafiq Nov 13 '19 at 00:40
  • 1
    For folks to help you interpret the error messages beyond guessing, a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) is going to be necessary – camille Nov 13 '19 at 00:55
  • I believe this question is really about reading in a file and not ggplot2. I think the symptom is the plot, but the problem is the file input. Try `na.strings = "---"` within `read.csv()`. –  Nov 13 '19 at 13:15

3 Answers3

1

Try

scale_x_continous(breaks = seq(2000, 2015, 5)

Sorry can't comment not enough rep

Hansel Palencia
  • 1,006
  • 9
  • 17
0

I think ggplot() will actually do a decent job at automatically using decent intervals. If you want to let it do its own thing, try converting the years to dates. An easy way to do this is with make_date() from lubridate.

libraries

library(dplyr)
library(tidyr)
library(lubridate)
library(ggplot2)

read in data (edit based on question update)

Here is the data based on the edit. This should bring it in as you had it. Note how I am obviously using a different approach than you did.

I think you will need to add na.strings = "---" as I did so that the NA fields will read in properly.

Also, I had to add District of Columbia inside of single quotes. This could possibly be a problem you are having.

data <- "State 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 2000 1999 1995 1990
Alabama  7.0  7.1  7.4  7.8  7.8  8.2  8.4  8.2  8.3  8.6  8.9  9.2  9.2  9.4  9.6  9.9  9.4 10.1 10.8  9.8 10.6
Alaska  6.9  7.1  7.4  7.5  7.3  7.2  7.8  8.0  7.8  8.4  8.5  8.2  8.2  8.5  8.1  8.3  8.1  8.9  8.6  9.0 10.2
Arizona  5.8  5.9  5.9  5.8  5.4  5.6  5.7  5.9  5.6  6.0  6.4  6.5  6.6  6.7  6.5  6.7  7.6  7.5  8.2  8.8 10.0
Arkansas  9.5  9.9 10.0 10.1  9.8 10.9 10.4 10.8 10.7 10.6 12.0 12.4 12.9 13.4 13.4 14.3 14.3 15.4 14.8 14.4 15.3
California   6.3  6.5  6.2  6.4  6.5  6.0  5.8  5.8  5.8  6.7  6.2  6.3  6.4  6.4  6.1  6.2  6.5  5.8  6.4  6.3  7.9
Colorado  7.3  7.4  6.8  7.1  6.5  6.8  7.0  6.9  6.9  7.4  7.1  7.2  7.6  7.4  7.8    8  8.2  8.3  8.2  9.0  9.8
Connecticut  5.6  5.6  5.3  5.4    5  5.2  5.5  5.6  5.9  5.4  5.5  5.5  5.8  5.8  5.5  5.7  5.4  5.7  5.8  6.6  7.9
Delaware  5.5  5.6  5.7    6  6.6  5.8  5.2  5.2  5.4  5.5  5.7  5.9  5.9  6.1    6  6.4  6.5  6.5  6.7  7.3  8.4
'District of Columbia'  8.2  8.1  8.2 11.8 10.8  8.4  8.7  7.6  4.7  4.1  4.2    4  4.1  5.2  5.1  5.1  6.2  4.9  6.6  6.1  8.2
Florida  7.8  8.1  8.2  7.3    7  7.2  7.4  7.3  7.5  8.0  8.5  8.6  8.9  9.0    9  9.4  9.3  8.9  8.7  9.9 10.9
Georgia  6.9  6.8  6.2  ---  ---  6.5  6.6  7.3  6.6  6.0  6.8  7.3  7.0  7.9    7  6.5  6.1  6.8  7.8  8.4 10.3
Hawaii 15.3 15.6 15.9 17.7 16.3 17.5 17.6 17.6 17.2 19.1 20.8 21.9 22.6 22.6   22 20.8 19.6 20.6 18.9 15.7 16.4
Idaho  7.8  8.1  8.2  8.4  8.2  8.2  8.6  8.8  8.9  9.5 10.0 10.1 10.5 10.8 10.9   11 11.2 10.8 12.1 13.1 13.9"

marriage <- read.table(textConnection(data), header = TRUE, na.strings = "---") %>%
  as_tibble() %>%
  rename_all(~ sub("X", "", .))

At this point, marriage should be your data. I will make the slight modification to convert to a date.

marriage <- marriage %>%
  gather(key=year, value=rate, `2017`:`1990`)%>%
  mutate(year = make_date(year))

plot

No changes to your plotting code. The axis was naturally handled.

ggplot(marriage, aes(x=year, y=rate, group=State)) +
  geom_point(aes(color=State)) +
  geom_line(aes(color=State)) +
  theme_bw()

graph output

  • Getting the following error from this, Error in is_string(x) : object '2017' not found – subrinarafiq Nov 13 '19 at 00:44
  • I have updated the original question to reflect the first few lines of the data set, I did have the column under year but it would cause R to read the file incorrectly which is why I had to get rid of it from my csv file all together. I have a row of years and a column of states. – subrinarafiq Nov 13 '19 at 01:16
  • The dataset obviously goes to all 51 states, so I have added a few more states so you can follow the following issue. When I run your command after you define data I keep getting the error, "Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 9 did not have 22 elements". Line 9 does have 22 values. Georgia is the only state missing values and even when I arbitrarily put values in I still get the error. Any ideas on what is causing this? – subrinarafiq Nov 13 '19 at 03:56
  • When I put in more states the y-axis starts looking overwhelmed. So I have done it with 30 states and the y-axis is listing every value that the states marriage rate is for the 21 years. So I think I still need to find a way to limit my axis or at least change the scales. – subrinarafiq Nov 14 '19 at 16:16
  • That is due to data types. It is still thinking these things are categorical. You need to convert x to date and make sure y is number. –  Nov 14 '19 at 17:17
0

The variable 'year' is a stored as character after gather. You can adjust in the reshape (updating to pivot_longer):

    marriage <-
      marriage %>%
      pivot_longer(
        cols = `2017`:`1990`,
        names_to = 'year',
        values_to = 'rate'
      ) %>%
      mutate(
        year = as.numeric(year)
      )

The ggplot calls should run from there.

AHart
  • 448
  • 3
  • 10
  • I get this error when I try to run that, Error in marriage %>% pivot_longer(cols = `2017`:`1990`, names_to = "year", : could not find function "%>%" – subrinarafiq Nov 13 '19 at 17:33
  • The pipe operator, `%>%` is from `magrittr`, part of the `tidyverse` – AHart Nov 13 '19 at 18:09
  • I installed the correct package now, but I get the following Error: No common type for `2017` and `2014` >. Call `rlang::last_error()` to see a backtrace. – subrinarafiq Nov 14 '19 at 05:35