R - Function to count the number of columns in a data frame when a prefix is entered

Question

This is a question for school, but I have been working on it for hours and just need a point in the right direction. I am not asking for the full answer.

I was given a data frame with student grades for various assessments. I have to write a function that will result in the number of columns that either start with a given prefix or match the name entirely.

I was provided with the following framework:

assessmentCount <- function(df, assessmentNamePrefix)
{

}

I need to be able to write the code to get the exact results below when the following lines of code are executed:

assessmentCount(df,"hw")
# [1] 7

and

assessmentCount(df,"exam1")
# [1] 1

I've found that the following code, when run independently of the framework and with the [hw] written in, gives the correct number of 7:

my_columns <- df[, grep("^[hw]", names(df), value=TRUE)]
ncol(my_columns)

However, when I do the same with [exam1], I get an incorrect number of 3 because it includes columns for exam1, exam2, and exam3:

my_columns <- df[, grep("^[exam1]", names(df), value=TRUE)]
ncol(my_columns)

Even worse, when I put the code into the framework and replace the values with the variable assessmentNamePrefix, I get incorrect values of 8 for both tests.

assessmentCount <- function(df, assessmentNamePrefix)
{
  my_columns <- df[, grep("^[assessmentNamePrefix]", names(df), value=TRUE)]
  ncol(my_columns)  
}

I am very frustrated at this point and am not understanding what is going wrong. I do realize that this is a very basic question, but I'm at the beginning of a very basic R programming course. Could someone please point me in the right direction? It would be very much appreciated. Thank you :)

You need to learn some regex. When you are using `[exam1]`- this will match any letter between the square brackets. You could do `grep("^exam1", c("exam1", "exam2", "exam3"))` instead. And as mentioned in the comment above- just use `sum` on `grepl`. — David Arenburg, Sep 05 '16 at 20:25
Thank you both. I don't know why I keep jumping over simple code and thinking it has to be more difficult. I keep wanting to break everything down into bits and then analyze the bits. Of course, the more steps, the more mistakes. I will do some re-reading re: regular expression. Thanks :) — Titan552, Sep 05 '16 at 20:34
See also this http://stackoverflow.com/questions/31467732/does-r-have-function-startswith-or-endswith-like-python — David Arenburg, Sep 05 '16 at 20:37
I also found that my textbook suggests loading the stringr library and using str_detect instead of grepl. There are always so many ways of coding. `sum(str_detect(names(df), assessmentNamePrefix))` — Titan552, Sep 05 '16 at 20:57

score 3 · Accepted Answer · edited Jun 20 '20 at 09:12

3

You can use the base startsWith() function, which is faster and more convenient than the regular expression grepl("^<prefix>", x) in this case, as specified from ?startsWith():

startsWith() is equivalent to but much faster than

substring(x, 1, nchar(prefix)) == prefix or also

grepl("^prefix", x)

assessmentCount <- function(df, assessmentNamePrefix)
{
    sum(startsWith(names(df), assessmentNamePrefix))    
}

edited Jun 20 '20 at 09:12

Community

1
1

answered Sep 05 '16 at 20:33

Psidom

209,562
33
339
356

It is nice but for some reason I cannot access `startsWith()`. Any idea why? – kwicher Sep 05 '16 at 20:50
1

Old R version :) ... after update to 3.3.1 it works. – kwicher Sep 05 '16 at 20:58
`startsWith(x, prefix)` and `endsWith(x, suffix)` were introduced in R 3.3.0. – lmo Sep 06 '16 at 11:53

score 2 · Answer 2 · answered Sep 05 '16 at 20:26

2

Your Regex appears wrong. I think it should be:

sum(grepl(paste0("^",assessmentNamePrefix),names(df)))

answered Sep 05 '16 at 20:26

kwicher

2,092
1
19
28

R - Function to count the number of columns in a data frame when a prefix is entered

2 Answers2