I have a bunch of repetitive code that I think I can make more efficient by putting into a for loop; been struggling with how to make them objects in R though.
Folder called input has 10 files titled "2010.txt, 2011.txt, ... 2019.txt"
LOOP ONE
files <- list.files("../input")
#Each Year File Path
y2010 <- read_file(glue("../input/", files[1], sep = ""))
y2011 <- read_file(glue("../input/", files[2], sep = ""))
...
y2019 <- read_file(glue("../input/", files[10], sep = ""))
From this I would like to do the following:
##Dataframe of each year's data
all_text <- rbind(y2010,y2011,y2012,y2013,y2014,y2015,y2016,y2017,y2018,y2019)
LOOP TWO Now I would like to take each year and make new "tok201x" objects.
###Each year
tok2010 <- data_frame(text = y2010) %>%
unnest_tokens(word, text)
tok2011 <- data_frame(text = y2011) %>%
unnest_tokens(word, text)
...
tok2019 <- data_frame(text = y2019) %>%
unnest_tokens(word, text)
LOOP THREE Lastly, take the "tok201x" and feed them in to the sentiment code.
#2010
nrc2010 <- tok2010 %>%
inner_join(get_sentiments("nrc")) %>% # pull out only sentiment words
count(sentiment) %>% # count each
spread(sentiment, n, fill = 0)# made data wide rather than narrow
#2011
nrc2011 <- tok2011 %>%
inner_join(get_sentiments("nrc")) %>% # pull out only sentiment words
count(sentiment) %>% # count each
spread(sentiment, n, fill = 0)# made data wide rather than narrow
...
#2019
nrc2019 <- tok2019 %>%
inner_join(get_sentiments("nrc")) %>% # pull out only sentiment words
count(sentiment) %>% # count each
spread(sentiment, n, fill = 0)# made data wide rather than narrow
And have these all stored in a list.
I was playing around with assign() but it was not working out the way I hoped.
EDIT: Using @desval's code with lapply(), I broke the function up. The purpose of this is to combine the lists into one df. How do i accomplish this though?
custom.function1 <- function(x){
#debug x <- files[1]
tmp <- read_file(x)
tmp <- tibble(text = tmp)
return(tmp)
}
custom.function2 <- function(x){
tmp <- tmp %>%
unnest_tokens(word, text) %>%
inner_join(get_sentiments("nrc")) %>% # pull out only sentiment words
count(sentiment) %>% # count each
spread(sentiment, n, fill = 0)
return(tmp)
}
out1 <- lapply(files, function1)
##Take all year data and combine into one dataframe, previously...
outYEAR <- matrix(unlist(out1), ncol = 10, byrow = TRUE)
outYEAR <- outYEAR %>%
pivot_longer(everything(), names_to = 'year', values_to = 'text')
##This does not work....
out2 <- lapply(out1, function2)
##Again, combine to one dataframe, previously...
out2YEAR <- matrix(unlist(out2), ncol = 10, byrow = TRUE)
out2YEAR <- out2YEAR %>%
pivot_longer(everything(), names_to = 'year', values_to = 'text')
#THIS DOES NOT WORK.
The collective df's need to be "matrix" not "tbl_df".