1

I am trying to obtain the descriptive statistics for specific variables (presented as columns) in a data frame. I am interested in obtaining the mean and standard deviation of two of the variables (polindex and log(gdp)) but I have two restrictions.

First, I need to make my estimations only for cases where both polindex and log(gdp) contain data at the same time (i.e. I need to exclude the observations for both variables any time one or the two of them present NA's). This is in order to have the same base number of observations in both estimations.

Second, I need to make my estimations only for the years in the range 1960-2000, this is, the observations for any year lower or greater than these should be excluded.

enter image description here

zx8754
  • 52,746
  • 12
  • 114
  • 209

1 Answers1

2

You can subset your original data frame and use only those rows which meet the following conditions:

!is.na(df$polindex) is TRUE
!is.na(df$log.gdp) is TRUE
df$year >= 1960 & df$year <= 2000 is TRUE

Here is code you could use to calculate the mean for the polindex:

pol.index  <- !is.na(df$polindex)
log.index  <- !is.na(df$log.gdp)
year.index <- df$year >= 1960 & df$year <= 2000

pol.mean   <- mean(df[pol.index & log.index & year.index, "polindex"], na.rm=TRUE)
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360