0

I am attempting to write a function in R that is called with the pmap function and renames the nested dataframes (or tibbles) that it creates using an argument passed from a list to the pmap function. I think this is best explained with a toy example that is reproducible. Here is one (which assumes the user is running in windows and has directory C:\temp\ already created and currently empty, although you could set the paths below to any directory of your choosing:

#create some toy sample input data files
write.csv(x=data.frame(var1=c(42,43),var2=c(43,45)), file="C:\\temp\\AL.csv")
write.csv(x=data.frame(var1=c(22,43),var2=c(43,45)), file="C:\\temp\\AK.csv")
write.csv(x=data.frame(var1=c(90,98),var2=c(97,96)), file="C:\\temp\\AZ.csv")
write.csv(x=data.frame(var1=c(43,55),var2=c(85,43)), file="C:\\temp\\PossiblyUnknownName.csv")

#Get list of files in c:\temp directory - assumes only files to be read in exist there
pathnames<-list.files(path = "C:\\temp\\", full.names=TRUE)
ListIdNumber<-c("ID3413241", "ID3413242", "ID3413243", "ID3413244")

#Create a named list.  In reality, my problem is more complex, but this gets at the root of the issue
mylistnames<-list(pathnames_in=pathnames, ListIdNumber_in=ListIdNumber)

#Functions that I've tried, where I'm passing the name ListIdNumber_in into the function so
#the resulting data frames are named.

#Attempt 1
get_data_files1<-function(pathnames_in, ListIdNumber_in){
  tempdf <- read.csv(pathnames_in) %>% set_names(nm=ListIdNumber_in)
}

#Attempt 2
get_data_files2<-function(pathnames_in, ListIdNumber_in){
  tempdf <- read.csv(pathnames_in) 
  names(tempdf)<-ListIdNumber_in
  tempdf
}

#Attempt 3
get_data_files3<-function(pathnames_in, ListIdNumber_in){
  tempdf <- read.csv(pathnames_in) 
  tempdf
}

#Fails
pmap(mylistnames, get_data_files1)->myoutput1

#Almost, but doesn't name the tibbles it creates and instead creates a variable named ListIdNumber_in
pmap(mylistnames, get_data_files2)->myoutput2

#This gets me the end result that I want, but I want to set the names inside the function
pmap(mylistnames, get_data_files3) %>% set_names(nm=mylistnames$ListIdNumber_in)->myoutput3

So when I run pmap I'd like to get the following result, only I'd like the naming of the nested data frames/tibbles to be done inside the function (and I don't really need the 'X' variable which I think is being erroneously created).:

$ID3413241
  X var1 var2
1 1   22   43
2 2   43   45

$ID3413242
  X var1 var2
1 1   42   43
2 2   43   45

$ID3413243
  X var1 var2
1 1   90   97
2 2   98   96

$ID3413244
  X var1 var2
1 1   43   85
2 2   55   43

Any ideas how this can be accomplished?

Thanks!

StatsStudent
  • 1,384
  • 2
  • 10
  • 28
  • Does this answer your question? [Use input of purrr's map function to create a named list as output in R](https://stackoverflow.com/questions/43935160/use-input-of-purrrs-map-function-to-create-a-named-list-as-output-in-r) – andrew_reece Sep 24 '20 at 03:47
  • @andrew_reece, unfortunately, no. All the solutions there name the resulting output AFTER the function call rather than within it. Thank you though. – StatsStudent Sep 24 '20 at 03:59
  • There are two solutions in that link (one in a comment, one an actual solution) that name the output within or before the function call. – andrew_reece Sep 24 '20 at 04:03
  • @andrew_reece, thanks. I see this now. This is a pretty good solution, but not exactly what I was looking for as it requires me to reference the ID names outside of the function call still by using something of the sort `mylistnames %>% { set_names(pmap(., get_data_files), mylistnames$ListIdNumber_in) }` unless I'm missing something. Essentially, I'm trying to eliminate having to re-specify `mylistnames$ListIdNumber_in` here. – StatsStudent Sep 24 '20 at 04:21
  • 1
    The issue is that the function call itself is not conscious of the fact that its own output will become an element which gets collated into a list, at the end of a set of `map` iterations - it doesn't have visibility into its eventual status as a list element that could be named. That's why `set_names()` must occur either before or after the call to `map`. The inside of `map` is ignorant of its output, you might say. (The closest is the `map_dfr()` `.id` argument, which offers a post-hoc add-on column based on the names of the input.) – andrew_reece Sep 24 '20 at 04:51
  • 1
    Thanks, @andrew_reece. That's actually a good idea. I might be able to restructure the program to make use of `map_dfr()` with the `.id` argument as you suggested. Many thanks for the helpful direction! – StatsStudent Sep 24 '20 at 04:54

2 Answers2

2
  • Use map here
  • No need to create a named list since you cannot attach names at top level while reading the csv, add names separately.
library(purrr)
map(pathnames, read.csv) %>% set_names(ListIdNumber)

#$ID3413241
#  var1 var2
#1   22   43
#2   43   45

#$ID3413242
#  var1 var2
#1   42   43
#2   43   45

#$ID3413243
#  var1 var2
#1   90   97
#2   98   96

#$ID3413244
#  var1 var2
#1   43   85
#2   55   43

In base R, this can be done as :

setNames(lapply(pathnames, read.csv), ListIdNumber)

The reason why you get an additional X column is because while writing the csv you are writing rownames as well. Set it to row.names = FALSE and you'll not have that column.

write.csv(x=data.frame(var1=c(42,43),var2=c(43,45)), 
          file="C:\\temp\\AL.csv", row.names = FALSE)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • 2
    Alternately: `pathnames %>% set_names(ListIdNumber) %>% map(read.csv) ` – andrew_reece Sep 24 '20 at 03:41
  • But this is just a toy example and in reality, I need to perform a number of different computations within my 'get_data_files' using some additional arguments that are passed to the function from my list which includes pathnames and other objects that are to be used within the function call. I'd like this all to be self-contained including the renaming if possible. – StatsStudent Sep 24 '20 at 03:48
  • 1
    @StatsStudent I don't think you can name the files as you want from inside the function. If there are additional things which you want to with `ListIdNumber` or other parameters in `mylistnames` yes, you can do that. For example, adding a column with the `id` `pmap(mylistnames, ~read.csv(..1) %>% mutate(colname = ..2))` and similarly many other things. – Ronak Shah Sep 24 '20 at 03:54
  • I was afraid that might be the case. Thanks, @RonakShah. I'll leave the question open for a few days and upvote your comment. If nothing better comes along in a few days to accomplish this, I'll accept the answer as this is pretty close to what I need. Thank you for your help! – StatsStudent Sep 24 '20 at 03:56
2

How about creating your own pmap for this purpose?

# assume that your names are always stored in `ListIdNumber_in`
named_pmap <- function(.l, .f, ...) set_names(pmap(.l, .f, ...), .l$ListIdNumber_in)

Then you can directly call named_pmap(mylistnames, get_data_files3). Except for the naming part, this named_pmap is basically the same as pmap.

ekoam
  • 8,744
  • 1
  • 9
  • 22