0

I'm using the following code

function(x,y) { RecordLinkage::levenshteinSim(x,y) }

to calculate the percent similarity between 2 inputs.

So far I decided to store the results in a matrix

matrix=outer(matext_vec,matext_, FUN=myfun)

But because my vector is so long the matrix gets too big in regards to memory.

Could I put in a threshhold saying, that only if the value is bigger than e.g. 0.9 a list should be created putting in the value as well as the inputs, that led to the value.

Sotos
  • 51,121
  • 6
  • 32
  • 66
Sebastian
  • 1
  • 1
  • I don't understand your question. If the output is too big for your memory, how will augmenting the output with the input help? – John Coleman Jan 11 '18 at 13:33
  • Sorry I must've been pretty phased out. I meant to mention the fact, that I wanted to use a threshhold so that a list contains only the values that met the threshhold thus reducing the amount of entries in comparison to a matrix – Sebastian Jan 11 '18 at 13:42
  • Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – Sotos Jan 11 '18 at 13:43
  • By working column by column you don't need to hold the entire matrix in memory at once. – John Coleman Jan 11 '18 at 13:46
  • how would I do that ? – Sebastian Jan 11 '18 at 13:47
  • As a preliminary step, you could write a function `f(x,v,t)` which returns all elements `y` in the vector `v` where the similarity between `x` and `y` exceeds the threshold `t`. Once that works, fix `v` to be `matext_` and let `x` vary over `matext_vec`. If you follow the recommendation of @Sotos and give a reproducible example I could try to flesh this out some more. – John Coleman Jan 11 '18 at 13:55

0 Answers0