0

Every time I try to calculate this line "DHS <- mean(ahebachelors2008) - mean(ahebachelors1992)" I receive an NA answer. Calculating mean(ahe2008) works but calculating mean(ahebachelors2008) does not work.

setwd("~/Google Drive/R Data")
data <- read.csv('cps92_08.csv')
year <- data$year
year1992 <- subset(data,year<2000)
year2008 <- subset(data,year>2000)
ahe1992 <- (year1992$ahe)
ahe2008 <- (year2008$ahe)
max(ahe1992)
min(ahe1992)
mean(ahe1992)
median(ahe1992)
sd(ahe1992)
max(ahe2008)
min(ahe2008)
mean(ahe2008)
median(ahe2008)
sd(ahe2008)

adjahe <- ahe1992*(215.2/140.3)
max(adjahe)
min(adjahe)
mean(adjahe)
median(adjahe)
sd(adjahe)

D <- mean(ahe2008) - mean(adjahe)

education <- data$bachelor
ahebachelors1992 <- subset(adjahe, education>0)
ahehighschool1992 <- subset(adjahe,education<1)
ahebachelors2008 <- subset(ahe2008,education>0)
ahehighschool2008 <- subset(ahe2008,education<1)

DHS <- mean(ahebachelors2008) - mean(ahebachelors1992)
smn
  • 1
  • 1
  • 1
    What's in `ahebachelors2008`? Is there any NA in there? run `which(is.na(ahebachelors2008))`? Or maybe there's just nothing in it at all? – iod Feb 02 '20 at 02:15
  • 1
    You probably have NA in your dataframe, you should use the argument `na.rm = TRUE` in your calculation of `mean`. Check the documentation `?mean()` – dc37 Feb 02 '20 at 02:17
  • Please make this question *reproducible*. This includes sample *unambiguous* data (e.g., `dput(head(x))` or `data.frame(x=...,y=...)`) and expected output. Refs: https://stackoverflow.com/questions/5963269, https://stackoverflow.com/help/mcve, and https://stackoverflow.com/tags/r/info. – r2evans Feb 02 '20 at 02:51
  • in ahebachelors2008 is the data for which average hourly earnings for people with a bachelors degree was calculated in 2008. i am so sorry if i am not answering your questions properly, i just started R two weeks ago so this is so so new for me! is there any information that i could share with you that will help you help me better? i ran the function which(is.na(ahebachelors2008)) there are 6594 data points in there. – smn Feb 02 '20 at 07:58
  • okay so i just checked the whole ahebachelors 2008 and noticed that out of 6594 datapoints, there are 2953 with figures --- the rest all contain NA. how would i fix that? – smn Feb 02 '20 at 08:09

2 Answers2

1

education is the same length as data, whereas ahe2008 is a subset of data. So when you pass education as the condition on ahe2008, it creates NAs (because that's the corresponding value in ahe2008 for those elements.

Here's a simpler example:

d1<-c(1:5)
d2<-c(1:5,1:5)
subset(d1,d2==1)
[1]  1 NA

Possible solutions would be to create separate bachelor vectors for each year, or to not continuously subset but just use multiple conditions where you need them.

If you're trying to avoid typing the full data$something every time, consider using with(), or even better - the dplyr package.

For example, all the code leading up to the last line could be replaced with this (assuming I didn't miss anything):

DHS <- mean(with(data,ahe[year>2000 & education>0])) - 
       mean(with(data,ahe[year<2000 & education>0]*(215.2/140.3))

(If you're new to R, note that the [] structure is a simpler way to call on subset).

You might also want to consider using summary which will give you min, median, mean, and max, leaving you with just sd to add manually.:

summary(with(data,ahe[year>2000]))
iod
  • 7,412
  • 2
  • 17
  • 36
  • Thank you so much for your comment! Can you help me with an example that uses multiple conditions, how would I formulate that on R? – smn Feb 02 '20 at 07:53
  • @sakina see my expanded answer above. – iod Feb 03 '20 at 14:23
  • oh my god! thank you so so so much. this was my first post on stackoverflow and i was not too sure if i would be able to get help! thank you so much! i will be sure to look around for an 'iod' commenting on my future posts. – smn Feb 04 '20 at 04:50
-1

If the values you are trying to calculate mean on contain NA then the output will be NA. You can overcome it by adding na.rm = TRUE to your mean:

DHS <- mean(ahebachelors2008, na.rm=TRUE) - mean(ahebachelors1992, na.rm=TRUE)
asafpr
  • 347
  • 1
  • 5