-1

I have several dataframes with firm-advisor relationships, one for each year of interest.

For instance, the 2015 dataframe looks like this. Let's call it advisors2015 (then I have also advisors2014, advisors2013, advisors2012, and so on):

> advisors2015
       [,1]   [,2]                    [,3]           [,4]                        
colnam "Mark" "Company.name"          "Company.ID"   "Advisor.Name"              
row1   "1"    "VOLKSWAGEN AG"         "DE2070000543" "PRICEWATERHOUSECOOPERS"    
row2   " "    "VOLKSWAGEN AG"         "DE2070000543" "PWC DEUTSCHE REVISION"     
row3   " "    "VOLKSWAGEN AG"         "DE2070000543" "C&L TREUARBEIT REVISION"   
row4   "2"    "ROYAL DUTCH SHELL PLC" "GB04366849"   "LLOYDS TSB REGISTRARS"     
row4   "2"    "ROYAL DUTCH SHELL PLC" "GB04366849"   "LLOYDS TSB REGISTRARS"     
row5   " "    "ROYAL DUTCH SHELL PLC" "GB04366849"   "PRICEWATERHOUSECOOPERS"
row6   " "    "ROYAL DUTCH SHELL PLC" "GB04366849"   "KPMG ACCOUNTANTS NV"       
row7   " "    "ROYAL DUTCH SHELL PLC" "GB04366849"   "ERNST & YOUNG"             
row8   "3"    "BP PLC"                "GB00102498"   "CAPITA ASSET SERVICES"     

And this is for 2014:

> advisors2014
         [,1]   [,2]                    [,3]           [,4]                        
colnam "Mark" "Company.name"          "Company.ID"   "Advisor.Name"              
row1   "1"    "VOLKSWAGEN AG"         "DE2070000543" "PRICEWATERHOUSECOOPERS"    
row2   " "    "VOLKSWAGEN AG"         "DE2070000543" "PWC DEUTSCHE REVISION"     
row3   " "    "VOLKSWAGEN AG"         "DE2070000543" "C&L TREUARBEIT REVISION"   
row4   "2"    "ROYAL DUTCH SHELL PLC" "GB04366849"   "LLOYDS TSB REGISTRARS"     
row5   " "    "ROYAL DUTCH SHELL PLC" "GB04366849"   "PRICEWATERHOUSECOOPERS"
row6   "3"    "BP PLC"                "GB00102498"   "CAPITA ASSET SERVICES"     
row7   "4"    "COCACOLA"              "GB111222333"  " "                         

As you can see, each company may have one ore more advisors. Of course they may also change over time: this year (that means in this dataframe) VOLKSWAGEN has 3 advisors but next year it may have just one, or substitute some of them with some others.

In order to keep track of all these changes, I would like to have a dataframe where for each company/year observation I save the list of advisors.

I know that we can do this using the nest function, but as far as I understand it is for creating lists from columns in the same dataframe, while I have multiple dataframes, say 10, like the ones above.

Could anyone help me to manage this issue? Many thanks.

Russ Cam
  • 124,184
  • 33
  • 204
  • 266
Macrina
  • 25
  • 8
  • You could split according to company:advisor combination and construct whatever result you want from that. If you provide a reproducible example I can show you. – Roman Luštrik Jul 01 '18 at 11:35
  • Yes, thank you. What do you mean reproducible example? I am not very practical of this website. – Macrina Jul 01 '18 at 12:17
  • 1
    In order to ask a better question please read [How to ask a good question](https://stackoverflow.com/help/how-to-ask) and [Minimal, Complete, and Verifiable Example](https://stackoverflow.com/help/mcve) and [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Rui Barradas Jul 01 '18 at 18:40

1 Answers1

0

If you are looking for a single data frame in which the columns are year, Company.name and a column containing a list each of whose elements is a data frame containing the rows for that year and Company.name then:

library(dplyr)
library(purrr)
library(tidyr)

ls(pattern = "^advisors\\d{4}$", envir = .GlobalEnv) %>%
  mget(envir = .GlobalEnv) %>%
  map_dfr(as.data.frame.matrix, .id = "year") %>%
  mutate(year = sub("advisors", "", year) %>% as.numeric) %>%
  nest(-c(year, Company.name))

giving:

# A tibble: 6 x 3
   year Company.name          data                
  <dbl> <fct>                 <list>              
1 2015. VOLKSWAGEN AG         <data.frame [3 x 3]>
2 2015. ROYAL DUTCH SHELL PLC <data.frame [4 x 3]>
3 2015. BP PLC                <data.frame [1 x 3]>
4 2016. VOLKSWAGEN AG         <data.frame [3 x 3]>
5 2016. ROYAL DUTCH SHELL PLC <data.frame [4 x 3]>
6 2016. BP PLC                <data.frame [1 x 3]>

or if you just want a long form data frame then omit the nest line.

Note

We assume the input is:

advisors2015 <- 
structure(list(Mark = c(1L, NA, NA, 2L, NA, NA, NA, 3L), 
Company.name = structure(c(3L, 
3L, 3L, 2L, 2L, 2L, 2L, 1L), .Label = c("BP PLC", "ROYAL DUTCH SHELL PLC", 
"VOLKSWAGEN AG"), class = "factor"), Company.ID = structure(c(1L, 
1L, 1L, 3L, 3L, 3L, 3L, 2L), .Label = c("DE2070000543", "GB00102498", 
"GB04366849"), class = "factor"), Advisor.Name = structure(c(6L, 
8L, 1L, 5L, 7L, 4L, 3L, 2L), .Label = c("C&L TREUARBEIT REVISION", 
"CAPITA ASSET SERVICES", "ERNST & YOUNG", "KPMG ACCOUNTANTS NV", 
"LLOYDS TSB REGISTRARS", "PRICEWATERHOUSECOOPERS", "PRICEWATERHOUSECOOPERS LLP", 
"PWC DEUTSCHE REVISION"), class = "factor")), 
class = "data.frame", row.names = c(NA, -8L))

advisors2015 <- advisors2016 <- as.table(as.matrix(advisors2015))
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Thank you for your suggestion. Well, when i run your command it returns me the error "Error in bind_rows_(x, .id) : Argument 2 must be length 36, not 4". However the structure of the data is provided above now. – Macrina Jul 01 '18 at 13:02
  • If you paste the input from the Note into R and then paste the code you will see that there is no error. If your own data gives an error then there is some difference that you haven't told us about. If the difference is that your data is matrices then convert them to data frames first. – G. Grothendieck Jul 01 '18 at 13:14
  • Ok, they should be tibbles. – Macrina Jul 01 '18 at 13:22
  • Have changed Note and code to make input tables. The output is a data frame. – G. Grothendieck Jul 01 '18 at 13:36
  • Thanks a lot! Now I try to apply it to my data. – Macrina Jul 01 '18 at 13:40
  • I did it. May you suggest me some references about how to deal with data with lists as variables? – Macrina Jul 01 '18 at 19:31
  • `nest` is part of tidyr package and `map_dfr` is in purrr so look at the documentation for those. – G. Grothendieck Jul 01 '18 at 20:41