0

I am trying to create a column which has the mean of a variable according to subsectors of my data set. In this case, the mean is the crime rate of each state calculated from county observations, and then assigning this number to each county relative to the state they are located in. Here is the function wrote.

Create the new column

Data.Final$state_mean <- 0

Then calculate and assign the mean.

 for (j in range[1:3136]) 
{
      state <- Data.Final[j, "state"]
      Data.Final[j, "state_mean"] <- mean(Data.Final$violent_crime_2009-2014, 
      which(Data.Final[, "state"] == state))
}

Here is the following error

Error in range[1:3137] : object of type 'builtin' is not subsettable

Very much appreciated if you could, take a few minutes to help a beginner out.

  • Because of the dash, `violent_crime_2009-2014` isn't a standard column name. You'll need to use it in backticks, `Data.Final$\`violent_crime_2009-2014\`` or in quotes with `[`: `Data.Final[["violent_crime_2009-2014"]]` – Gregor Thomas Nov 21 '17 at 19:30
  • error comes from using `[]` on `range` which is a function. `range(1:3137)` doesn't give an error, but probably doesnt do what you're intending – dww Nov 21 '17 at 19:32

2 Answers2

0

You've got a few problems:

  • range[1:3136] isn't valid syntax. range(1:3136) is valid syntax, but the range() function just returns the minimum and maximum. You don't need anything more than 1:3136, just use for (j in 1:3136) instead.

  • Because of the dash, violent_crime_2009-2014 isn't a standard column name. You'll need to use it in backticks, Data.Final$\violent_crime_2009-2014`` or in quotes with [: Data.Final[["violent_crime_2009-2014"]] or Data.Final[, "violent_crime_2009-2014"]

Also, your code is very inefficient - you re-calculate the mean on every single time. Try having a look at the Mean by Group R-FAQ. There are many faster and easier methods to get grouped means.

Without using extra packages, you could do

Data.Final$state_mean = ave(x = Data.Final[["violent_crime_2009-2014"]],
     Data.Final$state,
     FUN = mean)

For friendlier syntax and greater efficiency, the data.table and dplyr packages are popular. You can see examples using them at the link above.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
0

Here is one of many ways this can be done (I'm sure someone will post a tidyverse answer soon if not before I manage to post):

# Data for my example:
data(InsectSprays)

# Note I have a response column and a column I could subset on
str(InsectSprays)

# Take the averages with the by var:
mn <- with(InsectSprays,aggregate(x=list(mean=count),by=list(spray=spray),FUN=mean))

# Map the means back to your data using the by var as the key to map on:
InsectSprays <- merge(InsectSprays,mn,by="spray",all=TRUE)

Since you mentioned you're a beginner, I'll just mention that whenever you can, avoid looping in R. Vectorize your operations when you can. The nice thing about using aggregate, and merge, is that you don't have to worry about errors in your mapping because you get an index shift while looping and something weird happens.

Cheers!

Nate
  • 364
  • 1
  • 5