I'm trying to merge all csv files in a folder, and have slightly different versions of some extracts. For whatever reason, subtle changes to column names seem to have been made. What is the best way to take on the task of matching and merging the data from these columns?
I tried using the code below, after examining this answer: How to import multiple .csv files at once?
tbl <-
list.files(path = "//folder/anotherfolder/",
pattern = "*.csv",
full.names = T) %>%
map_df(~read_csv(., col_types = cols(.default = "c")))
The resulting tbl
has way more columns than any of the original files, the culprit being the fact that I have columns with names as follows in the tbl
after merging the csvs:
I have many of these merges ahead of me, and noted that the differences in column names don't always manifest in the same form (i.e. sometimes the problem is a "/" or some other character).
Can I expect to have to make these corrections manually, or is there a means of implementing some code to match columns? I'm thinking that fuzzy matching would be difficult given the fact that "Email Address Type-2" should only be merged with "Email.Address.Type.2", and not with "Email Address-2", etc.
Is there a means of altering the code to strip all space or punctuation characters from the column titles? This would solve my problem, because the only issues appear to be with characters other than letter and numbers.
My attempt thus far at removing special characters fails with the code below. My idea here was to try and remove characters in the column names strings within the map_df
statement.
tbl <-
list.files(path = "//folder/anotherfolder/",
pattern = "*.csv",
full.names = T) %>%
map_df(~read_csv(., col_names = gsub("[^[:alnum:] ]", "", .),trim_ws=TRUE,
col_types = cols(.default = "c")))
One more attempt below, and I receive an error message
tbl <-
list.files(path = "//folder/anotherfolder/",
pattern = "*.csv",
full.names = T) %>%
map_df(~read_csv(., {colnames<-c(gsub("[^[:alnum:] ]", "", x=names(.)))},
col_names = colnames,
col_types = cols(.default = "c")))
The error message received says:
Error: $ operator is invalid for atomic vectors
Thank you in advance for any help you can provide!