1

I have two problems with a combinatory/permutation issue in R.

I am running a combination for letters. I want to validate all resulting combinations to a german dictionary.

letters <- c("a","g","t")
sum_3 <- expand.grid(letters, letters, letters)
  1. Ideally, this would indicate that the word "tag" (english = day) will be found as it is existent in a German dictionary. How to connect/load/validate the results with a German dictionary?

  2. If I increase the number of letters and the expand.grid to a number n, then at some point, this becomes to memory intensive. Is there any way to slice the expand.grid part into multiple segments in order to avoid memory breakdown?

Phil
  • 7,287
  • 3
  • 36
  • 66
TomTe
  • 193
  • 9
  • Do you have a list of German dictionary words you want to use? I'd suggest finding one you can download. I'd search for it for you, but my guess is it will be easier to find one searching in German. – Gregor Thomas Feb 05 '23 at 16:14
  • As for (2) - what result are you trying to get here? [This FAQ](https://stackoverflow.com/q/22569176/903061) presents a nice comprehensive overview of perm/combination implementations in R, many of which are much more performant than `expand.grid`. If we understood your goal, we may be able to help more. – Gregor Thomas Feb 05 '23 at 16:17
  • Hi Gregor, thanks for your Input. I want to run all possible combinations from a set of 9 letters (9! I guess). afterwards, I want to match the results with a dictionary, to see how many "real/existing" words the combinations yield. Does this make more sense now ? – TomTe Feb 05 '23 at 20:54
  • I sadly don’t think that R will be the right tool for such a problem. With factorial(9) = 362.880 possible querys and against ~5.3 million words in german this will take a very long time. With 9^9 (387.420.489) combinations oof.. You will have to look at optimized algorithms for this kind of problem maybe a search method like iterative BST, maybe possibly De-Bruijn. Most likely this will be in C++ for performance ( use Rcpp if you want to wrap) – user12256545 Feb 05 '23 at 23:38
  • @TomTe since you can calculate how many possibilities there are you don't actually need to generate them. For example, if I have the letters D, A, S, G, T and my dictionary is "das", "der", "tag", I can calculate that there are 5^3 = 125 possible 3 letter words, and I can check my dictionary for words that contain only those letters and see that there are 2 matches. I don't need to write out all 125 words to make the calculation 2/125. – Gregor Thomas Feb 06 '23 at 14:11
  • 1
    You can use simple regex patterns to find words that are composed only of a set of letters, in this case the pattern would be `"^[dasgt]{3}$"` for 3-letter words using only the letters d, a, s, g, t. You can put other numbers or ranges in the `{}` to make it more flexible. – Gregor Thomas Feb 06 '23 at 14:12

0 Answers0