1

I have a dataset that shows median weekly earnings for people with different education levels over a 15 year period. I am trying to do a scatterplot of the values for each of two education levels, but for some reason my plot orders the y values of each education level separately and then stacks them.

Here is the Current Scatterplot, the problem is the order of the values on the Y axis

I'm not sure if this is a problem with how I gathered the original values or with how I am generating the plot, but I have looked through this and a few other sites and I cannot figure out how to fix this. I have included the code below.

EdData <- read.csv("~/desktop/EdData.csv")

library(ggplot2)
library(tidyr)
library(dplyr)

EdData_Long <- gather(EdData, "Education", "Earnings", Weekly.Earnings.HS.Only, Weekly.Earnings.College, na.rm = FALSE)

ggplot(data = EdData_Long, aes(x = Year, y = Earnings, colour = Education)) + geom_point()

I'm pretty new to R, so I'm sorry if this is really basic. I promise I did try to find the answer before posting, but I do not even know the right terms to describe the problem I am having. Thanks in advance for any help you can offer.

In case it is helpful, I have posted the (very small) data set here

MrFlick
  • 195,160
  • 17
  • 277
  • 295
bbernicker
  • 158
  • 1
  • 14
  • 3
    It' doesn't look like your earnings columns were read in as numeric. R doesn't like "$" in numeric values. This will likely be fixed if you import your data properly. This might help: http://stackoverflow.com/questions/7337824/read-csv-file-in-r-with-currency-column-as-numeric/7338251#7338251 – MrFlick Mar 13 '17 at 20:22
  • I opened the csv in Excel and converted everything to "general" then re-saved it to produce my answer below. I would have just commented but wanted to add the graphic. – Dan Slone Mar 13 '17 at 20:57

2 Answers2

1

Your csv file appears to be corrupt. I cleaned it up with the same data and your same code and got this: EdData corrected plot

Is this what you were talking about?

Dan Slone
  • 543
  • 2
  • 8
  • Yes, thank you very much. It never even occurred to me that it could be a problem with the way I imported the data. – bbernicker Mar 13 '17 at 23:20
1

You can use read_csv from the readr package to specify that the columns are numeric; it will figure out the conversion:

library(readr)
EdData <- read_csv("EdData.csv",
                   col_types = cols(`Annual Difference` = col_number(),
                                    Tuition = col_number(),
                                    `Weekly Earnings College` = col_number(),
                                    `Weekly Earnings Difference` = col_number(),
                                    `Weekly Earnings HS Only` = col_number(), 
                                    `Weekly Earnings No HS` = col_number()))

This will preserve the spaces in column names, so you also need to modify the gather:

EdData_Long <- gather(EdData_csv_EdData_csv, "Education", "Earnings",
                      `Weekly Earnings HS Only`, `Weekly Earnings College`,
                      na.rm = FALSE)
neilfws
  • 32,751
  • 5
  • 50
  • 63