1

I have imported a CSV file in R using dplyr::read_csv. The CSV file contains variable names many of which contain spaces. Some of the variable names are also in number e.g., 17, 18 etc. I would like to rename these variables to something more meaningful.

Snapshot of my data

I have tried the following codes for example:

rename(burkina, enum = Enumerator) 
rename(burkina, enum = `Enumerator`) 
rename(burkina, enum = "Enumerator") 
rename(burkina,test = `17`)

None of them seemed to have worked. Instead, I got the following error:

Error in make.names(x) : invalid multibyte string 1

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
Anup
  • 239
  • 2
  • 11
  • 1
    Please do not post an image of code/data/errors: it cannot be copied or searched (SEO), it breaks screen-readers, and it may not fit well on some mobile devices. Ref: https://meta.stackoverflow.com/a/285557/3358272 (and https://xkcd.com/2116/). Please just include the code or data (e.g., `dput(head(x))` or `data.frame(...)`) directly. – r2evans Aug 27 '19 at 22:11
  • 1
    You can divide the task into steps: 1) get rid of '()' 2) replace space with underscore 3) replace numeric names with appropriate names...and so on. Design the steps as appropriate for actual data. – Shree Aug 27 '19 at 22:16
  • does this help at all? https://stackoverflow.com/questions/14363085/invalid-multibyte-string-in-read-csv/29586283 – Ben Bolker Aug 27 '19 at 22:17
  • @BenBolker I saw your link before posting here...wasn't of much help. – Anup Aug 28 '19 at 01:38
  • It would be helpful if you could share a list of the current variable names. Right after you import the data with `burkina = read_csv(...)`, type `names(burkina)` and share the output in your original post. – user2363777 Aug 28 '19 at 02:20
  • Even more helpful, if you don't mind, would be to share the initial couple of rows of your dataframe. `dput(head(burkina, 5))` will create output that you can share here, and which when assigned to a variable in R by one of is will recreate the initial 5 rows of your dataframe. – user2363777 Aug 28 '19 at 02:22
  • If the file moved between operating systems (Linux -> Windows) or regions it could have the wrong encoding. I'm not an expert but you can change the encoding in Notepad++ or other software. Or you can use `iconv(x, to="ASCII//TRANSLIT")` in R to remove accents. – Simon Woodward Aug 29 '19 at 00:41
  • @user2363777 I tried and this is what I got: `Error in dput(head(burkina, 5)) : invalid multibyte string at 'What would be the top three areas you would like help or training on: Expanding a business - First' ` – Anup Aug 29 '19 at 03:20

2 Answers2

2

For cases like these the function clean_names() from the janitor package comes in handy. For instance:

> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

> head(iris %>% janitor::clean_names())
  sepal_length sepal_width petal_length petal_width species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

> head(iris %>% janitor::clean_names(case = "all_caps"))
  SEPAL_LENGTH SEPAL_WIDTH PETAL_LENGTH PETAL_WIDTH SPECIES
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

You can choose from a range of target cases, see ?janitor::clean_names.

user2363777
  • 947
  • 8
  • 18
1

The easiest way is to replace the column names using a character vector like this:

names(burkina) <- c("de", "draft_date", "submit_date", ...)

Alternatively you can use a function to convert the names into something more friendly. I use this function.

# function to simplify vector of names
ensnakeify <- function(x) {
  x %>%
    iconv(to="ASCII//TRANSLIT") %>% # remove accents
    str_replace_na() %>% # convert NA to string
    str_to_lower() %>% # convert to lower case
    str_replace_all(pattern="%", replacement="pc") %>% # convert % to pc
    str_replace_all(pattern="[^[:alnum:]]", replacement=" ") %>% # convert remaining non-alphanumeric to space
    str_trim() %>% # trim leading and trailing spaces
    str_replace_all(pattern="\\s+", replacement="_") # convert remaining spaces to underscore
}

# function to simplify df column names
autosnake <- function(df){ # to use in pipe
  names(df) <- ensnakeify(names(df))
  df
}

burkina <- read_csv("Filename") %>% autosnake
Simon Woodward
  • 1,946
  • 1
  • 16
  • 24