0

I have multiple csv including multiple information (such as "age") with different spellings for the same variable. For standardizing them I plan to read each of them and turn each into a dataframe for standardizing and then writing back the csv.

Therefore, I created a dictionary that looks like this:

enter image description here

I am struggling to find a way to do the following in R:

  1. Asking it to look through each of the colnames of the dataframe and comparing each to every "old_name" in the dictionary dataframe.
  2. If it finds the a match then replace the "old_name" with the "new_name"

Any help would be really useful!

Edit: the issue is not only with upper and lower case. For example, in some cases it could be: "years" instead of "age".

  • Have you tried `?match` to get which colnames to change? – Rui Barradas Dec 07 '19 at 16:10
  • [See here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making an R question that folks can help with. That includes a sample of data, all necessary code, and a clear explanation of what you're trying to do and what hasn't worked. – camille Dec 07 '19 at 16:45

1 Answers1

0

Here is a quick and dirty approach. I wrote a function so you could just change the arguments and quickly cycle through all your files. Using the stringi package is optional -- I'm using it to check the provided .csv file name, but you could remove that if you decide it's unnecessary.

library(stringi)

dict <- data.frame(path=c('../csv1','../csv1','../csv2','../csv3','../csv3'),
                         old_name=c('Age','agE','Name','years','NamE'),
                         new_name=c('age','age','name','age','name'))

example_csv <- data.frame(Age=c(43,34,42,24),NamE=c('Michael','Jim','Dwight','Kevin'))


standardizeColumnNames <- function(df,csvFileName,dictionary){
  colHeaders <- character(ncol(df))
  for(i in 1:ncol(df)){
    index <- which(dictionary$old_name == names(df)[i])
    if(length(index) > 0){
      colHeaders[i] <- as.character(dictionary$new_name[index[1]])
    } else {
      colHeaders[i] <- names(df)[i]
    }
  }
  names(df) <- colHeaders

  if(stri_sub(csvFileName,-4) != '.csv'){
    csvFileName <- paste0(csvFileName,'.csv')
  }
  write.csv(df,csvFileName)
}

standardizeColumnNames(example_csv,'test_file_name',dict)
ErrorJordan
  • 611
  • 5
  • 15