6

I have data frame with the first column as a categorical identifier, the second column as a frequency value and the remaining columns as raw data counts. I want to multiply all the count columns by the frequency column but not the first two.

All the raw count columns start with a capital letter followed by a full stop, e.g "L.abd", T.xyz etc.

For example, if I use the code:

    require(dplyr)
    ID <- c(1,2,3,4,5,6)
    Freq <- c(0.1,0.2,0.3,0.5,0.1,0.3)
    L.abc <- c(1,1,1,3,1,0)
    L.ABC <- c(0,3,2,4,1,1)
    T.xyz <- c(1,1,1,1,0,1)
    F.ABC <- c(4,5,6,5,3,1)

    df <- as.data.frame(cbind(ID, Freq, L.abc, L.ABC, T.xyz, F.ABC))

    df_new <- df %>% mutate_each(funs(.*Freq), starts_with("L."))        

I can create a new data frame containing the categorical data columns along with those columns starting with "L." which have been multiplied by the corresponding frequency value.

Is there a way to change the "starts_with" command to select all columns that begin with a capital letter and a full stop? My attempts to date using modifications such as "[A-Z]." have been unsuccessful.

Thanks in advance

Mr_J
  • 95
  • 1
  • 8

2 Answers2

7

For these cases, matches would be more appropriate

  df %>%
      mutate_each(funs(.*Freq), matches("^[A-Z]\\.", ignore.case=FALSE)) 

Here, I am assuming that you wanted to select only column names that start with a capital letter (^[A-Z]) followed by a .. We have to escape the . (\\.), otherwise it will be considered as any single character.

I am not changing anything except in the starts_with part. In the mutate_each if we need to pass a function, it can be passed inside a funs call. In the above code, we are multiplying each of the columns (.) selected by the matches with the 'Freq' column.

According to ?select

‘matches(x, ignore.case = TRUE)’: selects all variables whose name matches the regular expression ‘x’

EDIT: Added @docendodiscimus comment's

akrun
  • 874,273
  • 37
  • 540
  • 662
  • 2
    You might want to use `matches("^[A-Z]\\.", ignore.case = FALSE)` since it defaults to TRUE and OP wants to matach capital letters. Compare for example `select(iris, matches("^[a-z].*"))` and `select(iris, matches("^[a-z].*", ignore.case = FALSE))` – talat Aug 04 '15 at 11:51
  • @docendodiscimus Thanks, didn't check the default case. – akrun Aug 04 '15 at 12:33
3

I just answered a related question from other user, mutate_each will be deprecated in favor of mutate_at.

In your case the equivalent code is:

df %>% mutate_at(.cols=vars(matches("^[A-Z]\\.", ignore.case=FALSE)), .funs=funs(.*Freq))

ID Freq L.abc L.ABC T.xyz F.ABC 1 1 0.1 0.1 0.0 0.1 0.4 2 2 0.2 0.2 0.6 0.2 1.0 3 3 0.3 0.3 0.6 0.3 1.8 4 4 0.5 1.5 2.0 0.5 2.5 5 5 0.1 0.1 0.1 0.0 0.3 6 6 0.3 0.0 0.3 0.3 0.3

Community
  • 1
  • 1
Pablo Casas
  • 868
  • 13
  • 15