error with rowSums usng column names

Question

I am trying to segment Census data from fairly deaggregated data (e.g. age variables in 5-yr groups), & creating summary variables based on aggregation (e.g. all males 18+ per county). My solution is rowSums, e.g. county$MalesOver18 <- rowSums(county[,c(68:87)]), where vars 68-87 sum to males 18+ -- works fine. However, with 500 variables it is not efficient to count out the order of my start/end columns.

But when I use my preferred solution, column names for rowSums (e.g. rowSums(county[,c(H76007:H76025)], where H vars = field names), I get one of 2 msg errors:

run w/ col names in quotes: Error in "H76007":"H76025" : NA/NaN argument In addition: Warning messages: 1: In[.data.frame(county, , c("H76007":"H76025")) : NAs introduced by coercion 2: In[.data.frame(county, , c("H76007":"H76025")) : NAs introduced by coercion

run w/ col names not in quotes: Error in[.data.frame(county, , c(H76007:H76025)) : object 'H76007' not found

I have tried using the na.rm command & setting my variables as numeric -- although they are already integers -- and all to no result.

any guidance? thanks.

score 3 · Answer 1 · answered May 02 '13 at 03:11

When indexing data.frames by the column names, you can't use the : operator. When you do this with numeric values, it creates a sequence:

> 2:5
[1] 2 3 4 5

However, that doesn't work with character data which is what you were seeing:

> "foo":"bar"
Error in "foo":"bar" : NA/NaN argument
In addition: Warning messages:
...

So, what to do? I can think of two options:

Use grepl and some regex magic to identify the column names that you want to return. Here's a trivial example with the mtcars data:

#

colsToOperateOn <- grepl("mpg|cyl", colnames(mtcars))
> head(mtcars[, colsToOperateOn], 2)
              mpg cyl
  Mazda RX4      21   6
Mazda RX4 Wag  21   6

You would need to write however complicated of a regex as necessary to get the columns you want.

Use which to identify the index of the starting and ending columns you want, and then turn those into a sequence:

#

start <- which(colnames(mtcars) == "mpg")
end <- which(colnames(mtcars) == "cyl")
> head(mtcars[, start:end], 2)
              mpg cyl
Mazda RX4      21   6
Mazda RX4 Wag  21   6

This may be a poor example since mpg and cyl are right next to one another, but it should prove the point.

score 2 · Accepted Answer · answered May 02 '13 at 03:18

2

: cannot be used for character type. Try to first obtain the index:

rowSums(county[,(which(names(county)=='H76007'):which(names(county)=='H76025'))])

answered May 02 '13 at 03:18

Nishanth

6,932
5
26
38

Is there a way to include multiple groups of columns, such as `rowSums(county[,(which(names(county)=='H76007'):which(names(county)=='H76025'))]), which(names(county) == 'H9E007'):which(names(county) == 'H9E025'))`. I know that the "," is not the correct way to do this. – NiuBiBang May 02 '13 at 14:30
1

Yes, use `c()` to concatenate the vectors. So `,` is fine, but enclose them within `c()` – Nishanth May 02 '13 at 14:35

error with rowSums usng column names

2 Answers2

Linked