0

I've got a function which I'm trying to apply in a for loop that extracts a dataframe from multiple files and combines them into a single one.

This is how, from what I've read, I thought would be the best way to attack it but I get an empty list returned, when I was hoping for a list of dataframes which could be combined using bind_rows.

This is the code I'm using:


combined_functions <- function(file_name) {
  #combines the get_dfm_df and get corp function: get dfm tibble straight from the file name 
  data_frame_returned<- get_dfm_df(getcorp(file_name))
  data_frame_returned
}


list_of_dataframes <- list()

file.list <- dir(pattern ="DOCX$")
for (file in file.list) {
  dataframe_of_file <- combined_function(file) 
  append(list_of_dataframes,dataframe_of_file)
  
}

bind_rows(list_of_dataframes, .id = "column_label")  #https://stackoverflow.com/questions/2851327/convert-a-list-of-data-frames-into-one-data-frame

It creates an empty list, gets a list of the file names which the function combined_function uses to create a data frame out of the file and should, to my understanding, append this dataframe to the list. After all the files in the directory have been matched, bind_rows should combine it into one overall dataframe but it only returns an empty tibble. list_of_dataframes is also empty.

I've tried the solution in this answer but it didn't help: Append a data frame to a list

https://www.dropbox.com/sh/z8vh50b370gcb1j/AAAcbnfAUOM6-y8uWn4-lUWLa?dl=0

This a link to the raw files I am using in this case, but I think the problem is a general one.

Appendix:

These are the functions combined_functions refer too. They work on the individual cases so I'm confident this is not the cause of the problem but I've included them for completeness anyway.

rm(list = ls())
library(quanteda)
library(quanteda.corpora)
library(readtext)
library(LexisNexisTools)
library(tidyverse)
library(tools)

getcorp<- function(file_name){      
  #function to take the lexis word document, convert it into quanteda corpus object, returns duplicate df and date from filename in list
  
  LNToutput <- lnt_read(file_name)
  duplicates_df <- lnt_similarity(LNToutput = LNToutput,
                                  threshold = 0.99)
  duplicates_df <- duplicates_df[duplicates_df$Similarity > 0.99]   #https://github.com/JBGruber/LexisNexisTools  creates dataframe of duplicate articles 
  LNToutput <- LNToutput[!LNToutput@meta$ID %in% duplicates_df$ID_duplicate, ]  #removes these duplicates from the  main dataframe 
  corp <- lnt_convert(LNToutput, to = "quanteda") #to return multiple values from the r function, must be placed in a list 
  corp_date_from_file_name <- basename(file_name)
  file_date <- as.Date(corp_date_from_file_name, format ="%d_%m_%y")
  
  list_of_returns <-list(duplicates_df, corp,file_date)   #list returns has duplicate df in first position, corpus in second and the file date in third 
  
  
  list_of_returns
}

get_dfm_df <-  function(corp_list){
  
  # takes the corp from getcorp, applies lexicoder dictionary, adds the neg_pos etc to their equivalent  columns, 
  # calculates the percentage each category is of the total number of sentiment bearing words, adds the date specified from  the file name 
  
  corpus_we_want <- corp_list[[2]]
  sentiment_df <- dfm(corpus_we_want, dictionary = data_dictionary_LSD2015) %>% #applies the dictionary 
    
    convert("data.frame") %>%
    cbind(docvars(corpus_we_want)) %>%   #https://stackoverflow.com/questions/60419692/how-to-convert-dfm-into-dataframe-but-keeping-docvars
    as_tibble() %>%
    mutate(combined_negative = negative + neg_positive, combined_positive = positive  + neg_negative) %>% 
    mutate(pos_percentage = combined_positive/(combined_positive + combined_negative ), neg_percentage =combined_negative/(combined_positive + combined_negative ) ) %>% 
    mutate(date = corp_list[[3]]) 
  
  sentiment_df
}

jolene
  • 373
  • 2
  • 15
  • 1
    Have you tried the `map_dfc` function from the `purrr` package? – Daniel R Jul 07 '20 at 18:17
  • Ahh this seems to work ! For anyone else `dfc <- map(file.list,combined_functions)` does what I was intending. I had to make sure the character vector of the file names was converted to a list with the `as.list` method so that every file went into a seperate list element. ` – jolene Jul 11 '20 at 15:18

0 Answers0