Change name of dataframe based on a string vector

Question

I am reading a couple of excel files from a directory and I want the dataframe that is read to be dynamically named as per a vector of strings

I have a string vector which has name of countries cnts <- c("de", "ar", "fr")

Then I read an excel file, whose path is stored in a vector (file) already df <- read.xlsx(file[1], 1) Now I want to rename df to the first element in the countries vector, so I do cnts[1] <- df

But this does not work and gives me an error

In cnts[2] <- df number of items to replace is not a multiple of replacement length

I want the df to renamed as de I know the problem, it is trying to write the whole df to a string vector at position 1, but how can I dynamically rename dataframes?

I don't understand what you need. You want a variable named "de" which will contain the data frame df ? (and the second variable is "ar" which contain the second data frame etc ...) — gdevaux, Jun 26 '19 at 13:01

Clemsang · Accepted Answer · 2019-06-26T13:04:41.217

2

cnts[1] <- df means that you tried is storing a dataframe in a string of length 1 "de" <- df.

You can use assign, you must read why using assign is bad

cnts <- c("de", "ar", "fr")

df <- data.frame(a=1:5)

assign(cnts[1], df)
de

A better practice would be to use a list of size cnts and affect the dataframe to the right element of the list.

edited Jun 26 '19 at 13:04

answered Jun 26 '19 at 13:03

Clemsang

5,053
3
23
41

1

Totally in line with you. OP should look into `lists` or other data structures. – boski Jun 26 '19 at 13:04

score 2 · Answer 2 · 2019-06-26T13:35:11.333

With cnts[1] <- df you are telling R to add a dataframe to the first element of character vector cnts, which isn't possible. You can use assign to achieve what you want, but the general consensus is that assign should be avoided, particulary when programmatically importing multiple files. It might be a bit counterintuitive at first, but it often makes more sense to put your dataframes in named lists, e.g.:

cnts <- c("de", "ar", "fr")

# Create an empty list with names from `cnts`.
df_list <- vector(mode = "list", length = length(cnts))
names(df_list) <- cnts

# Read in the XLSX and add to appropriate list element.
df_list[[cnts[1]]] <- read.xlsx(file[1])

Instead of df_list <- vector(mode = "list", length = length(cnts)) you could also just use df_list <- list(), but the former is more efficient, particularly as your lists get longer. You can use either in your case, but it's never too early to learn good habits that will spare you some frustration down the road.

You'll end up with something like the following object:

$de
  one two
1   1   3
2   2   4

$ar
  one two
1   1   3
2   2   4

$fr
  one two
1   1   3
2   2   4

If you want to be super efficient you can also do something like this, assuming the names in cnts and the file names in file match positionally:

df_list <- lapply(file, read.xlsx)
names(df_list) <- cnts

I like your advice, but for someone new to lists, `\`names<-\`(vector(mode = "list", length = length(cnts)), cnts)` would be extremely confusing--and muddies the fact that generally the way to initialize a list would be `df_list <- list()` (or use `lapply`). Why not just name the list after filling it, with a nicely standard `names(df_list) <- cnts`? — Gregor Thomas, Jun 26 '19 at 13:19
@Gregor good point. Fixed the part with `names<-()`, but I disagree about using `vector` as it's not great practice to grow vectors. I did, however, expand on the part with `lapply`. — , Jun 26 '19 at 13:24

score 0 · Answer 3 · answered Jun 26 '19 at 13:20

Another option is to read all the datasets into a list, set the names of the list elements with 'cnts' (assuming it is the same order) and pollute the global environment with lots of objects (list2env)

list2env(setNames(lapply(files, read.xlsx), cnts), .GlobalEnv)

Change name of dataframe based on a string vector

3 Answers3