0

I have the raw file of survey data where I want to determine mean of certain columns based on identified demographic information provided in another column. For example, in-state residents are either 1 or 0 where 1 is in-state and 0 is out-of-state.

I want to pull the mean of a response for all those who's resident column is 1.

I tried grepl(1, rownames(resident)) but I believe grepl must only work on string values.

M J
  • 33
  • 4
  • 1
    Maybe have a look a the [How to calculate mean by group](https://stackoverflow.com/q/11562656/903061) FAQ? Or, since you say you want the mean of "certain columns" not just a single column, the more general [Apply several summary functions on several variables by group in one call?](https://stackoverflow.com/q/12064202/903061) FAQ. In both cases, for a new user I'd recommend the `dplyr` solutions. – Gregor Thomas Oct 19 '22 at 16:38
  • If you need more help, please share a little sample data, something like `dput(your_data[1:10, ])` for the first 10 rows. If you have a bunch of columns, `dput(your_data[1:10, 1:5])` to restrict to the first 5 column. – Gregor Thomas Oct 19 '22 at 16:40
  • `mean(subset(data, resident == 1)$response)` – Denny Chen Oct 19 '22 at 16:41
  • `rownames` are strings, but without seeing your data it's surprising that you are using the rownames in your attempt. However, even with strings you don't usually need `grep`, `grep` is for looking for patterns **inside** strings. If you are working with whole strings then `==` or `%in%` works just fine, as it does with numbers. – Gregor Thomas Oct 19 '22 at 16:42
  • Please provide enough code so others can better understand or reproduce the problem. – Mark Davies Oct 19 '22 at 20:02

0 Answers0