5

It seems Smalltalk implementations misses an algorithm which return all the indices of a substring in a String. The most similar ones returns only one index of an element, for example : firstIndexesOf:in: , findSubstring:, findAnySubstring: variants.

There are implementations in Ruby but the first one relies on a Ruby hack, the second one does not work ignoring overlapping Strings and the last one uses an Enumerator class which I don't know how to translate to Smalltalk. I wonder if this Python implementation is the best path to start since considers both cases, overlapping or not and does not uses regular expressions.

My goal is to find a package or method which provides the following behavior:

'ABDCDEFBDAC' indicesOf: 'BD'. "#(2 8)"

When overlapping is considered:

'nnnn' indicesOf: 'nn' overlapping: true. "#(0 2)"

When overlapping is not considered:

'nnnn' indicesOf 'nn' overlapping: false. "#(0 1 2)"

In Pharo, when a text is selected in a Playground, a scanner detects the substring and highlights matches. However I couldn't find a String implementation of this.

My best effort so far results in this implementation in String (Pharo 6):

indicesOfSubstring: subString
  | indices i |

  indices := OrderedCollection new: self size.
  i := 0.
  [ (i := self findString: subString startingAt: i + 1) > 0 ] whileTrue: [
    indices addLast: i ].
  ^ indices
user1000565
  • 927
  • 4
  • 12
  • 2
    May I encourage you to (try to) implement the method yourself? The SO rules establish that questions should exhibit some coding effort; show us yours. – Leandro Caniglia Jul 04 '18 at 21:48
  • Your `overlappling: true` versus `overlapping: false` example results seem to be reversed? – lurker Jul 05 '18 at 02:33
  • please don't forget to mark your questions answered when you are happy with an answer! For more see https://stackoverflow.com/help/someone-answers – tukan Jul 09 '18 at 09:01

2 Answers2

5

Let me firstly clarify that Smalltalk collections are 1-based, not 0-based. Therefore your examples should read

'nnnn' indexesOf: 'nn' overlapping: false. "#(1 3)"
'nnnn' indexesOf: 'nn' overlapping: true. "#(1 2 3)"

Note that I've also taken notice of @lurker's observation (and have tweaked the selector too).

Now, starting from your code I would change it as follows:

indexesOfSubstring: subString overlapping: aBoolean
  | n indexes i |
  n := subString size.
  indexes := OrderedCollection new.                            "removed the size"
  i := 1.                                                      "1-based"
  [
    i := self findString: subString startingAt: i.             "split condition"
    i > 0]
  whileTrue: [
    indexes add: i.                                            "add: = addLast:"
    i := aBoolean ifTrue: [i + 1] ifFalse: [i + n]].           "new!"
  ^indexes

Make sure you write some few unit tests (and don't forget to exercise the border cases!)

Leandro Caniglia
  • 14,495
  • 4
  • 29
  • 51
  • 1
    Thank you for your notes. Isn't more efficient in Smalltalk to do `[ (var := exp) > 0 ] ...` than `[ var := exp. var > 0 ]` – user1000565 Jul 06 '18 at 04:24
  • 3
    @user1000565 Please open another question for this interesting issue. – Leandro Caniglia Jul 06 '18 at 05:29
  • it just occurred to me the assignment `i := aBoolean ...` is superfluous. Isn't that so? Just `aBoolean ifTrue: [i := i + 1] ifFalse: [i := i + n]` should be enough. – tukan Jul 13 '18 at 14:07
  • 1
    @tunkan, you are right. I usually move the assignment outside of the blocks. I'm changing the code... Thanks! – Leandro Caniglia Jul 13 '18 at 14:42
1

Edited

It would also be nice if you would tell us what you need to achieve in the "greater picture". Sometimes Smalltalk offers different approaches.

Leandro beat me to the the code (and his code is more efficient), but I have already written it so I'll share it too. Heed his advice on Smalltalk being 1-based => rewritten example.

I have used Smalltalk/X and Pharo 6.1 for the example.

The code would be:

indexesOfSubstring: substringToFind overlapping: aBoolean

    | substringPositions aPosition currentPosition |

    substringPositions := OrderedSet new. "with overlap on you could get multiple same 
              positions in the result when there is more to find in the source string"

    substringToFindSize := substringToFind size. "speed up for large strings"
    aPosition := 1.

    [ self size > aPosition ] whileTrue: [
        currentPosition := self findString: substringToFind startingAt: aPosition.
        (currentPosition = 0) ifTrue: [ aPosition := self size + 1 ] "ends the loop substringToFind is not found"
                             ifFalse: [
                                 substringPositions add: currentPosition.
                                 aBoolean ifTrue: [ aPosition := aPosition + 1 ] "overlapping is on"
                                         ifFalse: [ aPosition := currentPosition + substringToFindSize ] "overlapping is off"
                             ]
    ].

    ^ substringPositions

I have fixed some issues that occured to me. Don't forget to test it as much as you can!

tukan
  • 17,050
  • 1
  • 20
  • 48
  • I think that `currentPosition` cannot be `nil` because `findString:startingAt:` answers with `0` when the substring isn't found. Anyway, it's been a good idea providing your version of the method. – Leandro Caniglia Jul 05 '18 at 22:02
  • @LeandroCaniglia thank you for correcting typo :) in startingAt:. It sucks when copy paste does not work correctly with SO. You are, of course, right about the `0` when nothing is found. I have adjusted the code. (I had the code at workspace and had to transform it to a selector). – tukan Jul 06 '18 at 06:21