I have 6 different dataframes that each calculates a cosine similarity between a set of documents. I have already calculated the cosine similarity, I just need to pull out the right variable on each of the six and save it. The code to do this looks like this:
# first I have to convert the model (calculating cosine similarity) into a dataframe. The model is a "Formal class textstal_simil" from quanteda
y_V_2 <- as.data.frame(as.matrix(y_V_2))
#then I label the reference documents column to "id"
y_V_2 <- cbind(id = rownames(y_V_2), y_V_2)
# because I have my variables of interest as columns ("DF", "EL", etc. are all political parties) # I change convert the dataframe from wide to long
y_V_2 <- y_V_2 %>%
pivot_longer(c("DF", "EL", "KF", "RV", "S", "SF", "V"), names_to = "party", values_to = "cos_sim")
# lastly, I filter the correct cosine similarity and save it in the final dataframe called: cos_sim_V_2
cos_sim_V_2 <- y_V_2 %>%
filter(party == "V")
Not sure if this makes sense. The bottom line is that I do this with six different dataframes (each one represents a political party). That line of code looks like this:
y_V_2 <- as.data.frame(as.matrix(y_V_2))
y_S_2 <- as.data.frame(as.matrix(y_S_2))
y_EL_2 <- as.data.frame(as.matrix(y_EL_2))
y_SF_2 <- as.data.frame(as.matrix(y_SF_2))
y_DF_2 <- as.data.frame(as.matrix(y_DF_2))
y_KF_2 <- as.data.frame(as.matrix(y_KF_2))
y_RV_2 <- as.data.frame(as.matrix(y_RV_2))
y_V_2 <- cbind(id = rownames(y_V_2), y_V_2)
y_S_2 <- cbind(id = rownames(y_S_2), y_S_2)
y_EL_2 <- cbind(id = rownames(y_EL_2), y_EL_2)
y_SF_2 <- cbind(id = rownames(y_SF_2), y_SF_2)
y_DF_2 <- cbind(id = rownames(y_DF_2), y_DF_2)
y_KF_2 <- cbind(id = rownames(y_KF_2), y_KF_2)
y_RV_2 <- cbind(id = rownames(y_RV_2), y_RV_2)
y_V_2 <- y_V_2 %>%
pivot_longer(c("DF", "EL", "KF", "RV", "S", "SF", "V"), names_to = "party", values_to = "cos_sim")
y_S_2 <- y_S_2 %>%
pivot_longer(c("DF", "EL", "KF", "RV", "S", "SF", "V"), names_to = "party", values_to = "cos_sim")
y_EL_2 <- y_EL_2 %>%
pivot_longer(c("DF", "EL", "KF", "RV", "S", "SF", "V"), names_to = "party", values_to = "cos_sim")
y_SF_2 <- y_SF_2 %>%
pivot_longer(c("DF", "EL", "KF", "RV", "S", "SF", "V"), names_to = "party", values_to = "cos_sim")
y_DF_2 <- y_DF_2 %>%
pivot_longer(c("DF", "EL", "KF", "RV", "S", "SF", "V"), names_to = "party", values_to = "cos_sim")
y_KF_2 <- y_KF_2 %>%
pivot_longer(c("DF", "EL", "KF", "RV", "S", "SF", "V"), names_to = "party", values_to = "cos_sim")
y_RV_2 <- y_RV_2 %>%
pivot_longer(c("DF", "EL", "KF", "RV", "S", "SF", "V"), names_to = "party", values_to = "cos_sim")
cos_sim_V_2 <- y_V_2 %>%
filter(party == "V")
cos_sim_S_2 <- y_S_2 %>%
filter(party == "S")
cos_sim_EL_2 <- y_EL_2 %>%
filter(party == "EL")
cos_sim_SF_2 <- y_SF_2 %>%
filter(party == "SF")
cos_sim_DF_2 <- y_DF_2 %>%
filter(party == "DF")
cos_sim_KF_2 <- y_KF_2 %>%
filter(party == "KF")
cos_sim_RV_2 <- y_RV_2 %>%
filter(party == "RV")
NOW, what I actually want to do is the following: these six dataframes are for year "2" (hence the 2 at the end of each). I actually have 22 years of interest. Therefore, I need to do this entire thing 22 times for 6 parties (for party 1: y_V_2, y_V_3, y_V_4 etc. etc.). Is there any way I can loop through this?
I have tried the following:
time <- 2:22
for (i in time){
y_V_[[i]] <- as.data.frame(as.matrix(y_V_[[i]]))
y_V_[[i]] <- cbind(id = rownames(y_V_[[i]]), y_V_[[i]])
y_V_[[i]] <- y_V_[[i]] %>%
pivot_longer(c("DF", "EL", "KF", "RV", "S", "SF", "V"), names_to = "party", values_to = "cos_sim")
y_V_[[i]] <- y_V_[[i]] %>%
filter(party == "V")
}
But it does not work. What is the correct way of this doing?
If it helps, this is the structure of the dataframe, once I convert the "formal_class textstal_simil" to dataframe: y_V_2 <- as.data.frame(as.matrix(y_V_2))
dput(head(y_V_2))
structure(list(DF = c(0.23499916674957, 0.16697708727056, 0.26998882552819,
0.11989777626359, 0.28145930377199, 0.15959668959184), EL = c(0.23595981215221,
0.18359709428329, 0.28810481269376, 0.13263861987521, 0.25331537435773,
0.18167733395369), KF = c(0.20936950007655, 0.18252467175417,
0.26042704505428, 0.14266913827392, 0.20023284784432, 0.18992935664409
), RV = c(0.2046697473122, 0.24951432279883, 0.24766480258903,
0.11242986749057, 0.23958714529124, 0.16084468614859), S = c(0.24270069472492,
0.18741729570808, 0.29014329186024, 0.14733535217516, 0.27150818619494,
0.18979023415197), SF = c(0.23561869890038, 0.17927679461636,
0.29403349472473, 0.15269893065285, 0.2559026802251, 0.17742356519735
), V = c(0.31302795687125, 0.2765158096593, 0.41588664999413,
0.21090507950169, 0.34787076982177, 0.2583219375177)), row.names = c("Anders Fogh Rasmussen",
"Anders Mølgaard", "Birthe Rønn Hornbech", "Bodil Thrane", "Charlotte Antonsen",
"Christian Mejdahl"), class = "data.frame")
Extra question (but not absolute necessary): can I combine looping through the 22 years with also looping through the six different parties? So that I only have to write the original 6 lines of code. The looping would then be through the parties (V, S, EL, SF, DF, KF, RV) as well as the 22 years for each party.