-4

Hello I am trying to write a function that will take in a data frame, and fix its respective column headers if there is a special character or space in the name. The function seems to work, as the results are printed, but it does not seem to save the respective changes to the original dataframe. Thoughts on how to fix this? The data I used to test it was a tbl_df, so I'm not sure if that has something to do with why it is not updating correctly. Thanks.

nameChange <- function(df) {
  for(i in 1:length(colnames(df)[i])) {
  if(str_detect(colnames(df[i]),"[:punct:]|[:space:]") == TRUE) {
  #Could use "\\s" to find space
  names(df) <- str_replace_all(names(df)," *",'')
  names(df) <- str_replace_all(names(df),"-",'')
  #df <- df
  assign('df',df, envir=.GlobalEnv)
  #return(df)
  print("Worked")
  }

    else{
      print("Function did not replace anything")
    }
  }
}

This is the data I am using to test the function:

#data from: http://www.tableau.com/learn/tutorials/on-demand/getting-started-data
orders_path <- file.path("/Users/petergensler/Desktop/Global Superstore.xls")
order_table <- read_excel(orders_path, sheet = "Orders")
nameChange(order_table)

Once I call colnames on order_table you should be able to see that the hyphen in Product Sub-Category is removed, and all the spaces inside of each column name are no longer there.

Jonathan Carroll
  • 3,897
  • 14
  • 34
petergensler
  • 342
  • 2
  • 8
  • 23
  • Just to check... `str_replace_all(names(df)," *",'')` and `str_replace_all(names(df),"-",'')` give you the desired output? – Weihuang Wong Aug 10 '16 at 23:20
  • Yes, I know that part of the function works correctly. I could have used the \\s to find the spaces in the strings or the [:space:] to detect them. My real issue is that when I pass this a tbl_df, it shows the changes made, but when I call colnames on the tbl_df, it doesn't look like the changes were saved. – petergensler Aug 10 '16 at 23:22
  • It says it worked but it still doesn't seem like it's saving the results. – petergensler Aug 10 '16 at 23:30
  • Perhaps a MWE would help. – Weihuang Wong Aug 10 '16 at 23:31
  • this is the file I am using...I am reading it in using these lines....orders_path <- file.path("/Users/petergensler/Desktop/Global Superstore.xls") order_table <- read_excel(orders_path, sheet = "Orders") link: http://www.tableau.com/learn/tutorials/on-demand/getting-started-data – petergensler Aug 10 '16 at 23:34
  • 3
    @petergensler Are you expecting it to modify the input data frame and have it saved to the same name so that if you called `nameChange(hey)` that the 'hey' data.frame would be modified? If so then you just have a misunderstanding of how things work. You're saving directly to 'df' so if you look in your global environment you should see the result saved literally as 'df' – Dason Aug 10 '16 at 23:49
  • @Dason Yes. I want the function to overwrite the dataframe I pass into it. My global environment does have the df variable in it. – petergensler Aug 10 '16 at 23:52
  • you reference `i` in that for loop conditional...It would be better to write `for(i in seq_along(df)){...}` – shayaa Aug 10 '16 at 23:58
  • @Dason So if I wanted to pass my dataframe into the function and just simply rewrite over it, what do I need to modify in my function to make it work like that? Any help would be appreciated. Thanks – petergensler Aug 11 '16 at 00:10
  • You say it works, but this: `for(i in 1:length(colnames(df)[i]))` should (and does) fail. – Jonathan Carroll Aug 11 '16 at 01:53
  • @jonathan-carroll OK, so then how would you make it work? I think the point of SO is to help people solve problems, not bash on them for their lack of understanding. When it was working it printed out my dataframe as a tibble with the column names changed, but when I called str() on it, it does not work. – petergensler Aug 11 '16 at 02:00
  • That wasn't bashing on anyone, just pointing out that there's more to the problem than stated. SO is NOT for *make this work* questions. You don't have a reproducible example, so we can't address what solution you might be after. Please provide an example input (from a clean workspace) and highlight the error you receive. – Jonathan Carroll Aug 11 '16 at 02:06
  • @JonathanCarroll added an example. let me know if it is not reproducible. You should be able to call str(order_table) and see the changes made in the column names to check if the changes worked. – petergensler Aug 11 '16 at 02:13
  • @petergensler Perhaps I should have said "minimal" reproducible. Few people will go get your exact data. For some tips, see http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example e.g. using a built-in data set (as @Weihuang Wong did) is helpful. In any case, I am able to reproduce the output "Worked" without modification if I have a variable `i=1` in my environment (and thus the loop failure I mentioned passes incorrectly). Please start a new R session and run your code again. – Jonathan Carroll Aug 11 '16 at 02:16

3 Answers3

4

Your function can be simplified:

nameChange <- function(df) {
  names(df) <- str_replace_all(names(df), "[:punct:]|[:space:]",  "")
  return(df)
}

Example:

library(dplyr)
library(stringr)

df <- tbl_df(mtcars)
names(df)[1] <- "m p g"
names(df)[2] <- "c-y-l"
names(df)
#  [1] "m p g" "c-y-l" "disp"  "hp"    "drat"  "wt"    "qsec"  "vs"    "am"   
# [10] "gear"  "carb" 

df <- nameChange(df)
names(df)
#  [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
# [11] "carb"
petergensler
  • 342
  • 2
  • 8
  • 23
Weihuang Wong
  • 12,868
  • 2
  • 27
  • 48
  • That is not what I am trying to do at all. I want to define a function that takes in a dataframe, searches for characters based on the [:punct:]|[:space:] =, and saves the results to the dataframe I supplied it. – petergensler Aug 11 '16 at 00:22
  • 1
    The example above does exactly what you requested. df is the supplied data frame. The str_replace_all function searches for the patterned string (substitute your desired pattern here). and then returns df and saves the results to the original data frame. – Dave2e Aug 11 '16 at 02:43
0

Any other issues aside (mentioned in comments; the i in the for loop is undefined), the issue appears to be that you aren't assigning back to the original data object:

 assign('df',df, envir=.GlobalEnv)

assigns to the object df, which I'm guessing you'll find in your environment after successfully running the function.

Presumably you want

 dfname <- deparse(substitute(df))
 assign(dfname, df, envir=.GlobalEnv)

which seems to work on testing.

Jonathan Carroll
  • 3,897
  • 14
  • 34
  • i is supposed to be the total number of columns in the dataframe(which is 24). length(colnames(order_table)[]). so for each element starting at 1 to 24, when the IF statement is TRUE, then it gets executed. so it should increment each time it runs. – petergensler Aug 11 '16 at 02:32
  • @petergensler you have `for(i in 1:length(colnames(df)[i]))` in your question; that second `i` isn't defined. Anyway, what's wrong with `1:ncol(df)` or better yet `seq_along(df)`? – Jonathan Carroll Aug 11 '16 at 02:36
  • Sorry I think I posted a bad version of my code....if you write colnames(superstore[1]), it returns "Row ID" which is the first column name in the dataset – petergensler Aug 11 '16 at 02:40
  • 1
    This is presumably the best answer for what the OP is trying to achieve. I'm sorry that, nevertheless, I can't upvote it. A function with such side effects is just bad code. Future readers should not be encouraged to do this kind of stuff. As you pointed out in a comment, returning the object like in the answer by @WeihuangWong is the proper way to handle such situations. – RHertel Aug 11 '16 at 08:14
-2

Below is the code that answers my problem:

test1 <- function(df){
    names(df) <- str_replace_all(names(df), "[:punct:]|[:space:]","")
    df <<- df
    return(df)
}

There is nothing wrong with using df as an argument, but you do need to use the global assignment <<- operator so that this function can be called on pretty much any dataframe, and overwrite the existing df in your workspace. Thank you all for your help.

petergensler
  • 342
  • 2
  • 8
  • 23
  • 1
    Yeah, I'm going to be happy to see those downvotes pile up. While I'm happy for you to have found something that works, this doesn't answer your question of 'what is wrong with this?' `<<-` is equivalent to my answer and not a revelation to those who have answered you. It's regarded as very poor practice to use anyway. Returning the `df` and assigning is the standard R approach. – Jonathan Carroll Aug 11 '16 at 07:18