2

I have 6 different dataframes that each calculates a cosine similarity between a set of documents. I have already calculated the cosine similarity, I just need to pull out the right variable on each of the six and save it. The code to do this looks like this:

# first I have to convert the model (calculating cosine similarity) into a dataframe. The model is a "Formal class textstal_simil" from quanteda
y_V_2 <- as.data.frame(as.matrix(y_V_2))

#then I label the reference documents column to "id"
y_V_2 <- cbind(id = rownames(y_V_2), y_V_2)

# because I have my variables of interest as columns ("DF", "EL", etc. are all political parties) # I change convert the dataframe from wide to long
y_V_2 <- y_V_2 %>% 
  pivot_longer(c("DF", "EL", "KF", "RV", "S", "SF", "V"), names_to = "party", values_to = "cos_sim")

# lastly, I filter the correct cosine similarity and save it in the final dataframe called: cos_sim_V_2
cos_sim_V_2 <- y_V_2 %>% 
  filter(party == "V")

Not sure if this makes sense. The bottom line is that I do this with six different dataframes (each one represents a political party). That line of code looks like this:

y_V_2 <- as.data.frame(as.matrix(y_V_2))
y_S_2 <- as.data.frame(as.matrix(y_S_2))
y_EL_2 <- as.data.frame(as.matrix(y_EL_2))
y_SF_2 <- as.data.frame(as.matrix(y_SF_2))
y_DF_2 <- as.data.frame(as.matrix(y_DF_2))
y_KF_2 <- as.data.frame(as.matrix(y_KF_2))
y_RV_2 <- as.data.frame(as.matrix(y_RV_2))


y_V_2 <- cbind(id = rownames(y_V_2), y_V_2)
y_S_2 <- cbind(id = rownames(y_S_2), y_S_2)
y_EL_2 <- cbind(id = rownames(y_EL_2), y_EL_2)
y_SF_2 <- cbind(id = rownames(y_SF_2), y_SF_2)
y_DF_2 <- cbind(id = rownames(y_DF_2), y_DF_2)
y_KF_2 <- cbind(id = rownames(y_KF_2), y_KF_2)
y_RV_2 <- cbind(id = rownames(y_RV_2), y_RV_2)

y_V_2 <- y_V_2 %>% 
  pivot_longer(c("DF", "EL", "KF", "RV", "S", "SF", "V"), names_to = "party", values_to = "cos_sim")
y_S_2 <- y_S_2 %>% 
  pivot_longer(c("DF", "EL", "KF", "RV", "S", "SF", "V"), names_to = "party", values_to = "cos_sim")
y_EL_2 <- y_EL_2 %>% 
  pivot_longer(c("DF", "EL", "KF", "RV", "S", "SF", "V"), names_to = "party", values_to = "cos_sim")
y_SF_2 <- y_SF_2 %>% 
  pivot_longer(c("DF", "EL", "KF", "RV", "S", "SF", "V"), names_to = "party", values_to = "cos_sim")
y_DF_2 <- y_DF_2 %>% 
  pivot_longer(c("DF", "EL", "KF", "RV", "S", "SF", "V"), names_to = "party", values_to = "cos_sim")
y_KF_2 <- y_KF_2 %>% 
  pivot_longer(c("DF", "EL", "KF", "RV", "S", "SF", "V"), names_to = "party", values_to = "cos_sim")
y_RV_2 <- y_RV_2 %>% 
  pivot_longer(c("DF", "EL", "KF", "RV", "S", "SF", "V"), names_to = "party", values_to = "cos_sim")

cos_sim_V_2 <- y_V_2 %>% 
  filter(party == "V")
cos_sim_S_2 <- y_S_2 %>% 
  filter(party == "S")
cos_sim_EL_2 <- y_EL_2 %>% 
  filter(party == "EL")
cos_sim_SF_2 <- y_SF_2 %>% 
  filter(party == "SF")
cos_sim_DF_2 <- y_DF_2 %>% 
  filter(party == "DF")
cos_sim_KF_2 <- y_KF_2 %>% 
  filter(party == "KF")
cos_sim_RV_2 <- y_RV_2 %>% 
  filter(party == "RV")

NOW, what I actually want to do is the following: these six dataframes are for year "2" (hence the 2 at the end of each). I actually have 22 years of interest. Therefore, I need to do this entire thing 22 times for 6 parties (for party 1: y_V_2, y_V_3, y_V_4 etc. etc.). Is there any way I can loop through this?

I have tried the following:

time <- 2:22

for (i in time){
  
  y_V_[[i]] <- as.data.frame(as.matrix(y_V_[[i]]))
  
  
  
  y_V_[[i]] <- cbind(id = rownames(y_V_[[i]]), y_V_[[i]])
  
  
  y_V_[[i]] <- y_V_[[i]] %>% 
    pivot_longer(c("DF", "EL", "KF", "RV", "S", "SF", "V"), names_to = "party", values_to = "cos_sim")
  
  y_V_[[i]] <- y_V_[[i]] %>% 
    filter(party == "V")
  
}

But it does not work. What is the correct way of this doing?

If it helps, this is the structure of the dataframe, once I convert the "formal_class textstal_simil" to dataframe: y_V_2 <- as.data.frame(as.matrix(y_V_2))

dput(head(y_V_2))

structure(list(DF = c(0.23499916674957, 0.16697708727056, 0.26998882552819, 
0.11989777626359, 0.28145930377199, 0.15959668959184), EL = c(0.23595981215221, 
0.18359709428329, 0.28810481269376, 0.13263861987521, 0.25331537435773, 
0.18167733395369), KF = c(0.20936950007655, 0.18252467175417, 
0.26042704505428, 0.14266913827392, 0.20023284784432, 0.18992935664409
), RV = c(0.2046697473122, 0.24951432279883, 0.24766480258903, 
0.11242986749057, 0.23958714529124, 0.16084468614859), S = c(0.24270069472492, 
0.18741729570808, 0.29014329186024, 0.14733535217516, 0.27150818619494, 
0.18979023415197), SF = c(0.23561869890038, 0.17927679461636, 
0.29403349472473, 0.15269893065285, 0.2559026802251, 0.17742356519735
), V = c(0.31302795687125, 0.2765158096593, 0.41588664999413, 
0.21090507950169, 0.34787076982177, 0.2583219375177)), row.names = c("Anders Fogh Rasmussen", 
"Anders Mølgaard", "Birthe Rønn Hornbech", "Bodil Thrane", "Charlotte Antonsen", 
"Christian Mejdahl"), class = "data.frame")

Extra question (but not absolute necessary): can I combine looping through the 22 years with also looping through the six different parties? So that I only have to write the original 6 lines of code. The looping would then be through the parties (V, S, EL, SF, DF, KF, RV) as well as the 22 years for each party.

Andy
  • 109
  • 5
  • Put your data.frames into a list when you create them! It's very easy to iterate over a list. – Roland Dec 02 '21 at 13:49

1 Answers1

2

You can use get(object_name) to get an object by name

for (i in time) {
  df <- get(paste0("y_V_", i))
}

Will get the dataframe y_V_{i} where i is the time index. You can do the letter as well:

for (i in time) {
  for (l in letter_vector) {
    df <- get(paste0("y_", l, "_", i))
  }
}

Will write y_{l}_{i} to df, given that they all exist. That's up to you


Edit: use assign to write to a pasted object name

for (i in time) {
  for (l in letter_vector) {
    df <- get(paste0("y_", l, "_", i))
    assign(paste0("df_", l, "_", i), df)
  }
}

Second edit. You can write the dataframes to a list:

# first initialize the list
list_with_dfs <- list()

for (i in time) {
  for (l in letter_vector) {
    df <- get(paste0("y_", l, "_", i))
    assign(paste0("df_", l, "_", i), df)

    # Then write to the list
    list_with_dfs[[length(list_with_dfs) +  1]] <- get(paste0("df_", l, "_", i))

    # Or just use the df 
    list_with_dfs[[length(list_with_dfs) +  1]] <- df
  }
}
tavdp
  • 421
  • 2
  • 6
  • 11
  • This can work I think. But how do I (in the end) save the 22*6 instances of df into its own unique df? so e.g. with the label cos_sim_l_i? I tried to end it with: paste0("cos_sim_", l, "_", i) <- df %>% filter(party == l) but it does not work. Essentially how do I save each iteration in its own unique dataframe, that has the character (from "letter_vector") and the index (from "time")? – Andy Dec 02 '21 at 14:46
  • use `assign(pasted_object_name, data_to_write)` – tavdp Dec 02 '21 at 14:55
  • This is actually really good! I have a last follow up question (sorry): it worked with assign, but I just realized that I now have around a 100 dataframes. Do you know how I can save it into a list instead, so that I in the end can rbind them all and instead get a long dataframe? right now I have tried: First I created a list: listofdfs <- list() and then ended the loop with listofdfs[[j]] <- assign(paste0("cos_sim_", l, "_", i), df ). However, it yields an error. (j is a vector of 1:132) – Andy Dec 02 '21 at 15:45
  • Sure! Just realize that `assign` relates values to a variable, where the variable name can be dynamic. So for that, you need to `get` the object again. You are on the right track, but just need to use `get` instead of `assign`. Or you could just use the "temporary" df as in the example. I'll edit the answer once more – tavdp Dec 02 '21 at 16:00