0

I want to remove spaces from certain data frame variables in a for loop. I tried something like this:

for (j in 1:5) {
    df <- df %>%
        dplyr::mutate(paste0("var", j) = (gsub("[[:blank:]]", "", paste0("var", j))))
}

But I got this error:

Error: unexpected '=' in:
"    df <- df %>%
         dplyr::mutate(paste0("var", j) ="

In reality, I have more than 5 variables, and I am planning to adjust the data using other functions, too. How can I assure that the loop elements are specific column names, so that when I do something like df$loop_element, the variable that I want is extracted from the data frame?

Mikael Jagan
  • 9,012
  • 2
  • 17
  • 48
  • [Here](https://stackoverflow.com/questions/49469982/r-using-a-string-as-an-argument-to-mutate-verb-in-dplyr) is an example of how you can use strings (variables) in `mutate()` - look especially at the last code chunk in the accepted answer. – DaveArmstrong Mar 28 '23 at 12:11
  • Why do you want to do it in a loop? It would be less code if you did it not in a loop... if you share a little bit of sample data we can demonstrate. – Gregor Thomas Mar 28 '23 at 12:12

2 Answers2

0

You can do this operation (and more related to column names) through across instead of a loop.

Here is a little example:

df <- tibble(var1 = c("hey ", "jude "), 
             var2 = c("my ", "life "), 
             var3 = c(" hendrix", " is "), 
             var4 = c("pear ", "apple"), 
             var5 = c("", "bananas"))
vars <- paste0("var", 1:5)
df %>% 
  mutate(across(any_of(vars), \(x) gsub("[[:blank:]]", "", x)))
#> # A tibble: 2 × 5
#>   var1  var2  var3    var4  var5     
#>   <chr> <chr> <chr>   <chr> <chr>    
#> 1 hey   my    hendrix pear  ""       
#> 2 jude  life  is      apple "bananas"
0

A single line solution in base-R:

df1 <- data.frame(matrix(rownames(mtcars), ncol = 8))
names(df1) <- paste0("var", 1:8)   
selvars <- paste0("var", 1:5) 

df1[ , selvars] <- lapply(df1[ , selvars], function(x)gsub("[[:blank:]]", "", x))

Programming in the Tidyverse with column names held in variables is certainly easier than it used to be (no longer necessary to wrestle with arcane concepts such as quasiquotation etc.), but standard indexing in base R is arguably even simpler for this type of task.

EDIT: Just for completeness you could also use this approach in a loop if it makes it easier to understand:

for (j in 1:5) {
  df1[, paste0("var", j)] <- gsub("[[:blank:]]", "", df1[, paste0("var", j)])
}
Knackiedoo
  • 502
  • 3
  • 8