I have a data set that looks like this:
library(tidyverse)
data <- tibble(id = 1:10,
vectors = list(rnorm(25)))
# A tibble: 25 x 2
id vectors
<int> <list>
1 1 <dbl [25]>
2 2 <dbl [25]>
3 3 <dbl [25]>
4 4 <dbl [25]>
5 5 <dbl [25]>
6 6 <dbl [25]>
7 7 <dbl [25]>
8 8 <dbl [25]>
9 9 <dbl [25]>
10 10 <dbl [25]>
I'd like to use this data set to find cosine similarity where each row represents a document. The cosine
function from the lsa
package seems like a good/easy way to do this, however I would need each document represented as a column. I'd like to simply to do data %>% t()
to get my desired result, but that's not working. I've also tried "spreading" the list column first using unest
and spread
. I've also tried flatten
to no avail. The first line of my desired output would look something like:
1 2 3 4 5 6 7 8 9 10
0.1 0.3 0.7 0.3 0.1 0.1 0.3 0.7 0.3 0.1
If there's a function from another package that handles data in this format I would by all means just use that instead though at this point I would like to figure this out from a curiosity standpoint. I've looked at R - list to data frame, but I'm not sure how I can apply that to this situation.
The background to this is that I've performed doc2vec in python with gensim but do to our environment in work, if I want to build something interactive for a client it would need to be in R.