1

I want to plot census data to compare data for each race over multiple years.

My data frame has years 1950-2010 (every 10 years) as the rows and race as the columns. The data at the cross section is the percentage of that race in a given year.

I want my line graph to plot the years on the x axis and race on the y axis. So with my 5 "race" variables, there would be 5 lines of different colors all plotted on the same graph.

I have tried to watch videos and scoured all over here but nothing I find seems to work the way I want it to.

Edit: I refactored to the code and built my own dataframe instead of having it return a matrix.

However, I want the right side to say "Race" and then have my 5 lines. I am working on getting one line to show up at all before doing the other 4.

new dataframe returned plot

Edit: I have figured out thus far in my code - Allston <- ggplot(data = dataAllston, aes(Year, White.pct, group = 1)) + geom_point(aes(color = "orange")) + geom_line(aes(color = "orange"))

I want to scale the Y axis and from 0-1 in 0.2 increments and have the Y be "Race" instead of the individual labels. And more than just relabeling -- I want the graph to be representative of the actual increases/decreases as opposed to a straight line diagonally down as it is now.

I think it will take me longer to learn how to make the reproducible code than it will to make tweaks.

new returned plot

Edit:

dput(dataAllston)

returns

structure(list(Year = c(1950, 1960, 1970, 1980, 1990, 2000, 2010
), White.pct = structure(7:1, .Label = c("57.0", "59.0", "63.0", 
"78.0", "90.8", "98.0", "98.3"), class = "factor"), BlackOrAA.pct = 
structure(c(2L, 
1L, 3L, 4L, 5L, 4L, 4L), .Label = c("1.20", "1.30", "2.60", "5.00", 
"9.00"), class = "factor"), Hispanic.pct = structure(c(1L, 1L, 
3L, 4L, 2L, 2L, 2L), .Label = c("0.00", "13.0", "3.10", "6.00"
), class = "factor"), AsianOrPI.pct = structure(c(1L, 1L, 5L, 
6L, 2L, 3L, 4L), .Label = c("0.00", "14.0", "18.0", "20.0", "3.20", 
"9.00"), class = "factor"), Other.pct = structure(c(2L, 1L, 3L, 
4L, 5L, 4L, 4L), .Label = c("1.20", "1.30", "2.60", "5.00", "9.00"
), class = "factor")), class = "data.frame", row.names = c(NA, 

-7L))

result from dput(data)

tl124
  • 21
  • 4
  • please provide the output fo `dput(dataAllston)`, we can't copy paste your dataset and check on our session what code will be working. see: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – dc37 Apr 05 '20 at 20:25

1 Answers1

2

You need first to reshape your dataset into a longer format by using for example pivot_longer function from tidyr. At the end, your data should look like this.

As your data are in factor format (except Year column), the first line will convert all of them into a numerical format much appropriate for plotting.

library(dplyr)
library(tidyr)

Reshaped_DF <- df %>% mutate_at(vars(ends_with(".pct")), ~as.numeric(as.character(.))) %>%
   pivot_longer(-Year, names_to = "Races", values_to = "values")

# A tibble: 35 x 3
    Year Races         values
   <dbl> <chr>          <dbl>
 1  1950 White.pct       98.3
 2  1950 BlackOrAA.pct    1.3
 3  1950 Hispanic.pct     0  
 4  1950 AsianOrPI.pct    0  
 5  1950 Other.pct        1.3
 6  1960 White.pct       98  
 7  1960 BlackOrAA.pct    1.2
 8  1960 Hispanic.pct     0  
 9  1960 AsianOrPI.pct    0  
10  1960 Other.pct        1.2
# … with 25 more rows

Then, you can plot it in ggplot2 by doing:

library(ggplot2)

ggplot(Reshaped_DF,aes(x = Year, y = values, color = Races, group = Races))+
  geom_line()+
  geom_point()+
  ylab("Percentage")

enter image description here Does it answer your question ?

If not, please consider providing a reproducible example of your dataset that people can easily copy/paste. See this guide: How to make a great R reproducible example

RyanFrost
  • 1,400
  • 7
  • 17
dc37
  • 15,840
  • 4
  • 15
  • 32
  • I did exactly that (renaming df to my dataFrame of course) and got "Error: No common type for `Year` and `White.pct` >." – tl124 Apr 05 '20 at 19:57
  • Thanks for your help - I am not sure if my code is reproducible because I have had to calculate out individual values. Do you mean supply how my data frame is created up to how I am plotting it? – tl124 Apr 05 '20 at 20:34
  • Thanks, just updated -- didn't see your other comment. I'm new to Stack and I'm not used to jumping around hahah. – tl124 Apr 05 '20 at 20:39
  • I cannot thank you enough for your patience... this whole forum thing on top of coding is a learning curve indeed. – tl124 Apr 05 '20 at 20:46
  • Much better now ;) Check my edited answer. You should get it to work now. don't worry, you will get the use too by one day, just take time to read all link people provides they are really useful ressources. – dc37 Apr 05 '20 at 20:53