0

I want to remove words from a dictionary given certain conditions. I would like to do so that in the next iteration the dictionary will calculate over the new one, that has that last item removed, so it won't calculate again.

// sample data
    $dict = ['aaa', 'aaan','aba', 'abat', 'ime', 'iso', 'nime', 'tiso',];
    $unique = ['abatiso', 'aaanime'];

  // could use while to further optimize unset (and remove on the fly) http://php.net/manual/en/control-structures.foreach.php#88578
    while (list($key_word, $word) = each($unique)) { // $key is unused, just for the optimization that the whille provides
      foreach ($dict as $key_other => $other) {

// ... conditions calculations

        unset($unique[$key_word]);
 }
}
    echo "n compounds: " . count($compounds) . NL;

If I set the inner loop with a while instead foreach as the outer, I get 0 results, it terminates inmediately.

For now, I'm getting duplicate results like:

               // Removed: abatiso => wc: aba + tiso = abatiso
                // Removed: abatiso => wc: abat + iso = abatiso
                // Removed: abatiso => wc: abati + so = abatiso
                // Removed: abatiso => wc: abatis + o = abatiso

How can I make it so it removes the word and won't proccess it again on next iteration?

Some test data:

Removed: aaaaaah => wc: aaaa + aah = aaaaaah
Removed: aaaaaah => wc: aaaaaa + h = aaaaaah
Removed: aaaaargh => wc: aaa + aargh = aaaaargh
Removed: aaaalead => wc: aaaa + lead = aaaalead
Removed: aaabbbccc => wc: aaab + bbccc = aaabbbccc
Removed: aaacomix => wc: aaa + comix = aaacomix
Removed: aaagak => wc: aaa + gak = aaagak
Removed: aaahh => wc: aaa + hh = aaahh
Removed: aaainc => wc: aaa + inc = aaainc
Removed: aaainc => wc: aaai + nc = aaainc
Removed: aaanet => wc: aaa + net = aaanet
Removed: aaanet => wc: aaan + et = aaanet
Removed: aaanime => wc: aaa + nime = aaanime
Removed: aaanime => wc: aaan + ime = aaanime
Removed: aaaron => wc: aaa + ron = aaaron
Removed: aabbcc => wc: aab + bcc = aabbcc
Removed: aabmup => wc: aab + mup = aabmup
Removed: aabre => wc: aab + re = aabre
Removed: aabybro => wc: aaby + bro = aabybro
Removed: aacap => wc: aac + ap = aacap
Removed: aacap => wc: aaca + p = aacap
Removed: aaccording => wc: aac + cording = aaccording
Removed: aacd => wc: aac + d = aacd
Removed: aachener => wc: aach + ener = aachener
Removed: aachener => wc: aachen + er = aachener
Removed: aacisuan => wc: aaci + suan = aacisuan
Removed: aacisuan => wc: aacis + uan = aacisuan
Removed: aacult => wc: aac + ult = aacult

I'm not using a break inside the inner loop because I have to do calculations also.

Cristo
  • 700
  • 1
  • 8
  • 20
  • Please be more specific on your question and code. – Eiko Jun 13 '16 at 08:35
  • @Eiko What you don't understand? – Cristo Jun 13 '16 at 09:31
  • I'm currently thinking something like a ``for ($i=0; $i – Cristo Jun 13 '16 at 09:58
  • You're not showing much of your code at all. You do show some random output, which we cannot relate to anything - we just cannot make any sense of it. Please construct a minimal example with input, output, and your code. – Eiko Jun 13 '16 at 10:22
  • The comparations inside doesn't matter here. The question is how to iterate and remove on the fly. If you read in detail you can get it. I add chewed array examples – Cristo Jun 13 '16 at 10:45

1 Answers1

0

There is an error in your code. You set the $key value in two places with two different meanings. At first you assign it in the list(.. statement and then again in the foreach loop as key value holder for the values in $dict.

As a rule of thumb it is never good to unset elements from a list while you are iterating over that list. You better save the items you processed in a list and don't process them again. If you want to you can remove those items later on, after you finished the loop over unique.

If I understand your question correctly this would be a way to go:

$toUnset = [];
foreach ($unique as $key => $word) {
    if (!in_array($word, $toUnset)) {
        foreach ($dict as $other) {

            //do your processing
            $toUnset[] = $word;
        }
    }
}
cb0
  • 8,415
  • 9
  • 52
  • 80
  • Sorry, I'm not using $key, the code is a bit messy of so many ongoing changes and tries, I will correct. I'd prefer to do that on the fly so it runs faster, because currently takes almost 24h to run :D – Cristo Jun 13 '16 at 09:31
  • How many items does your dict have ? 24 hours sounds like a serious runtime problem. – cb0 Jun 13 '16 at 09:49
  • Only 300k and 110k. I think that some strlen and strpos and a custom binary_search (instead of builtin php in_array/search, because dicts are sorted) inside the loop are getting it too slow. I'm trying the isset vs strlen hack (http://stackoverflow.com/questions/6955913/isset-vs-strlen-a-fast-clear-string-length-calculation), looks, promissing – Cristo Jun 13 '16 at 09:53