2

I'm trying to replicate solution on applying multiple functions in sapply posted on R-Bloggers but I can't get it to work in the desired manner. I'm working with a simple data set, similar to the one generated below:

require(datasets)
crs_mat <- cor(mtcars)

# Triangle function
get_upper_tri <- function(cormat){
  cormat[lower.tri(cormat)] <- NA
  return(cormat)
}

require(reshape2)
crs_mat <- melt(get_upper_tri(crs_mat))

I would like to replace some text values across columns Var1 and Var2. The erroneous syntax below illustrates what I am trying to achieve:

crs_mat[,1:2] <- sapply(crs_mat[,1:2], function(x) {
 # Replace first phrase
 gsub("mpg","MPG",x), 
 # Replace second phrase
  gsub("gear", "GeArr",x)
 # Ideally, perform other changes
})

Naturally, the code is not syntactically correct and fails. To summarise, I would like to do the following:

  1. Go through all the values in first two columns (Var1 and Var2) and perform simple replacements via gsub.
  2. Ideally, I would like to avoid defining a separate function, as discussed in the linked post and keep everything within the sapply syntax
  3. I don't want a nested loop

I had a look at the broadly similar subject discussed here and here but, if possible, I would like to avoid making use of plyr. I'm also interested in replacing the column values not in creating new columns and I would like to avoid specifying any column names. While working with my existing data frame it is more convenient for me to use column numbers.

Edit

Following very useful comments, what I'm trying to achieve can be summarised in the solution below:

fun.clean.columns <- function(x, str_width = 15) {
  # Make character
  x <- as.character(x)
  # Replace various phrases
  x <- gsub("perc85","something else", x)
  x <- gsub("again", x)
  x <- gsub("more","even more", x)
  x <- gsub("abc","ohmg", x)
  # Clean spaces
  x <- trimws(x)
  # Wrap strings
  x <- str_wrap(x, width = str_width)
  # Return object
  return(x)
}
mean_data[,1:2] <- sapply(mean_data[,1:2], fun.clean.columns)

I don't need this function in my global.env so I can run rm after this but even nicer solution would involve squeezing this within the apply syntax.

Community
  • 1
  • 1
Konrad
  • 17,740
  • 16
  • 106
  • 167
  • Can you elaborate on the other changes you want to do? How many replacements do you have? – Heroka Nov 02 '15 at 13:57
  • @Heroka, thanks for showing the interest. Let's assume that I'm willing to make 10 replacements and other cosmetic changes like applying `trimws`. On principle, `gsub` is enough for me but I want to be able to apply various iterations of the `gsub` commands to the columns I decide to pass via `sapply`. – Konrad Nov 02 '15 at 14:00
  • The main problem you have, is that you're not assigning anything, so no changes are made. – Heroka Nov 02 '15 at 14:01
  • If you are replacing multiple entries, why not use `mgsub` from `library(qdap)` – akrun Nov 02 '15 at 14:04
  • @Heroka I know, as indicated, the code is wrong but it conveys what I am trying to achieve. In simple words I want to apply multiple operations to the list of columns without *within* the `sapply`. In effect, it's a syntax problem. I could write a function and pass it to `sapply` but I don't want to. – Konrad Nov 02 '15 at 14:04
  • @Konrad see my answer, it's very close to what you wrote but syntactically correct and everything is within sapply. – Heroka Nov 02 '15 at 14:06
  • @akrun `mgsub` would work, but I'm interested in solving the problem via `sapply`. There is no specific reason for that, I just want to learn more about `sapply`. – Konrad Nov 02 '15 at 14:06
  • @Konrad I would not use `sapply` as the structure gets mangled. The output of `sapply` is a `matrix` for list of equal lengths. The main point of my code is to reduce the number of `gsub` calls. – akrun Nov 02 '15 at 14:09
  • Just to give an example `df1 <- data.frame(V1= LETTERS[1:5], V2= LETTERS[3:7]);sapply(df1, relevel, ref='D'); lapply(df1, relevel, ref='D')`. Even if you are assigning to the dataset i.e. `df1[] <- sapply(df1, ..` The class gets changed. – akrun Nov 02 '15 at 14:16

2 Answers2

3

Here is a start of a solution for you, I think you're capable of extending it yourself. There's probably more elegant approaches available, but I don't see them atm.

crs_mat[,1:2] <- sapply(crs_mat[,1:2], function(x) {
  # Replace first phrase
  step1 <- gsub("mpg","MPG",x)
  # Replace second phrase. Note that this operates on a modified dataframe. 
  step2 <- gsub("gear", "GeArr",step1)
  # Ideally, perform other changes
  return(step2)

  #or one nested line, not practical if more needs to be done
  #return(gsub("gear", "GeArr",gsub("mpg","MPG",x)))
})
Heroka
  • 12,889
  • 1
  • 28
  • 38
3

We can use mgsub from library(qdap) to replace multiple patterns. Here, I am looping the first and second column using lapply and assign the results back to the crs_mat[,1:2]. Note that I am using lapply instead of sapply as lapply keeps the structure intact

library(qdap)
crs_mat[,1:2] <- lapply(crs_mat[,1:2], mgsub, 
   pattern=c('mpg', 'gear'), replacement=c('MPG', 'GeArr'))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks very much for your contribution. The solution would definitely work in case of `gsub`, I'm mostly interested in expanding `sapply` syntax as I would like to capitalise on the available flexibility. For example I may have to apply `gsub` 5 times, after `trimws` and finally `toupper` or something on those lines. – Konrad Nov 02 '15 at 14:26
  • 1
    @Konrad I guess you have looked at the example posted showing `lapply` vs `sapply` – akrun Nov 02 '15 at 14:57
  • I did, but it's not clear to me whether it would be possible to draft an dequate syntax. I understand that I could define a new function `my_problematic_strings <- function(string_var) { # All the stuff I want to do }` and then pass it: `sapply(dta_with_columns, function(x) my_problematic_strings)`. Ideally, I wouldn't like to define this function as it will "sit" in a global environment for no reason. – Konrad Nov 02 '15 at 15:26
  • @Konrad I didn't understand how the `sapply` will be a better option than `lapply` based on what you mentioned. – akrun Nov 02 '15 at 15:30