0

I have built a blog platform in VB.NET where the audience are very young, and for some reason like to express their commitment by repeating sequences of characters in their comments.

Examples:

Hi!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! <3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3 LOLOLOLOLOLOLOLOLOLOLOLOLLOLOLOLOLOLOLOLOLOLOLOLOL

..and so on.

I don't want to filter this out completely, however, I would like to shorten it down to a maximum of 5 repeating characters or sequences in a row. I have no problem writing a function to handle a single repeating character. But what is the most effective way to filter out a repeating sequence as well?

This is what I used earlier for the single repeating characters

Private Shared Function RemoveSequence(ByVal str As String) As String
    Dim sb As New System.Text.StringBuilder
    sb.Capacity = str.Length
    Dim c As Char
    Dim prev As Char = String.Empty
    Dim prevCount As Integer = 0

    For i As Integer = 0 To str.Length - 1
        c = str(i)
        If c = prev Then
            If prevCount < 10 Then
                sb.Append(c)
            End If
            prevCount += 1
        Else
            sb.Append(c)
            prevCount = 0
        End If
        prev = c
    Next

    Return sb.ToString
End Function

Any help would be greatly appreciated

Magnus Engdal
  • 5,446
  • 3
  • 31
  • 50
  • Note that you would be killing any ASCII art with your approach. And for whatever reason, ASCII art still seems to be quite popular. – Franci Penov Jun 30 '10 at 09:38
  • I agree, but for this project and the target audience it doesn't really matter. I just want it to maintain itself as much as possible. – Magnus Engdal Jun 30 '10 at 11:59
  • [What algorithm can you use to find duplicate phrases in a string?](http://stackoverflow.com/questions/88615/what-algorithm-can-you-use-to-find-duplicate-phrases-in-a-string) – Sjoerd Jun 30 '10 at 09:36

1 Answers1

0

You should be able to recursively use the 'Longest repeated substring problem' to solve this. On the first pass you will get two matching sub-strings, and will need to check if they are contiguous. Then repeat the step for one of the sub-strings. Cut off the algo, if the strings are not contiguous, or if the string size become less than a certain number of characters. Finally, you should be able to keep the last match, and discard the rest. You will need to dig around for an implementation :(

Also have a look at this previously asked question: finding long repeated substrings in a massive string

Community
  • 1
  • 1
tathagata
  • 478
  • 3
  • 12