How to rename variables imported from a CSV file?

Question

I have imported a CSV file in R using dplyr::read_csv. The CSV file contains variable names many of which contain spaces. Some of the variable names are also in number e.g., 17, 18 etc. I would like to rename these variables to something more meaningful.

Snapshot of my data

I have tried the following codes for example:

rename(burkina, enum = Enumerator) 
rename(burkina, enum = `Enumerator`) 
rename(burkina, enum = "Enumerator") 
rename(burkina,test = `17`)

None of them seemed to have worked. Instead, I got the following error:

Error in make.names(x) : invalid multibyte string 1

Please do not post an image of code/data/errors: it cannot be copied or searched (SEO), it breaks screen-readers, and it may not fit well on some mobile devices. Ref: https://meta.stackoverflow.com/a/285557/3358272 (and https://xkcd.com/2116/). Please just include the code or data (e.g., `dput(head(x))` or `data.frame(...)`) directly. — r2evans, Aug 27 '19 at 22:11
You can divide the task into steps: 1) get rid of '()' 2) replace space with underscore 3) replace numeric names with appropriate names...and so on. Design the steps as appropriate for actual data. — Shree, Aug 27 '19 at 22:16
does this help at all? https://stackoverflow.com/questions/14363085/invalid-multibyte-string-in-read-csv/29586283 — Ben Bolker, Aug 27 '19 at 22:17
@BenBolker I saw your link before posting here...wasn't of much help. — Anup, Aug 28 '19 at 01:38
It would be helpful if you could share a list of the current variable names. Right after you import the data with `burkina = read_csv(...)`, type `names(burkina)` and share the output in your original post. — user2363777, Aug 28 '19 at 02:20
Even more helpful, if you don't mind, would be to share the initial couple of rows of your dataframe. `dput(head(burkina, 5))` will create output that you can share here, and which when assigned to a variable in R by one of is will recreate the initial 5 rows of your dataframe. — user2363777, Aug 28 '19 at 02:22
If the file moved between operating systems (Linux -> Windows) or regions it could have the wrong encoding. I'm not an expert but you can change the encoding in Notepad++ or other software. Or you can use `iconv(x, to="ASCII//TRANSLIT")` in R to remove accents. — Simon Woodward, Aug 29 '19 at 00:41
@user2363777 I tried and this is what I got: `Error in dput(head(burkina, 5)) : invalid multibyte string at 'What would be the top three areas you would like help or training on: Expanding a business - First' ` — Anup, Aug 29 '19 at 03:20

score 2 · Answer 1 · answered Aug 27 '19 at 23:12

For cases like these the function clean_names() from the janitor package comes in handy. For instance:

> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

> head(iris %>% janitor::clean_names())
  sepal_length sepal_width petal_length petal_width species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

> head(iris %>% janitor::clean_names(case = "all_caps"))
  SEPAL_LENGTH SEPAL_WIDTH PETAL_LENGTH PETAL_WIDTH SPECIES
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

You can choose from a range of target cases, see ?janitor::clean_names.

Interesting method. Unfortunately, it did not work. Gave the following error: `Error in make.names(.) : invalid multibyte string 495` — Anup, Aug 28 '19 at 02:07
Sounds like you have some bad text coding or bad characters in your data. — Simon Woodward, Aug 29 '19 at 02:51

score 1 · Answer 2 · answered Aug 27 '19 at 22:19

The easiest way is to replace the column names using a character vector like this:

names(burkina) <- c("de", "draft_date", "submit_date", ...)

Alternatively you can use a function to convert the names into something more friendly. I use this function.

# function to simplify vector of names
ensnakeify <- function(x) {
  x %>%
    iconv(to="ASCII//TRANSLIT") %>% # remove accents
    str_replace_na() %>% # convert NA to string
    str_to_lower() %>% # convert to lower case
    str_replace_all(pattern="%", replacement="pc") %>% # convert % to pc
    str_replace_all(pattern="[^[:alnum:]]", replacement=" ") %>% # convert remaining non-alphanumeric to space
    str_trim() %>% # trim leading and trailing spaces
    str_replace_all(pattern="\\s+", replacement="_") # convert remaining spaces to underscore
}

# function to simplify df column names
autosnake <- function(df){ # to use in pipe
  names(df) <- ensnakeify(names(df))
  df
}

burkina <- read_csv("Filename") %>% autosnake

I now use janitor::clean_names() which does much the same thing as my function. — Simon Woodward, Mar 24 '22 at 21:31

How to rename variables imported from a CSV file?

2 Answers2