You can use regular expressions (Regex), below code should exclude text inside all parenthesis and braces, also removes an exclamation mark - feel free to expand CleanUp
method to filter out other punctuation symbols:
Imports System.Text.RegularExpressions
Module Module1
Sub Main()
Dim re As New Regex("\(.*\)|{.*}") 'anything inside parenthesis OR braces
Dim input As String = "Hello (this is) me and {that is} him!"
Dim inputParsed As String = re.Replace(input, String.Empty)
Dim reSplit As New Regex("\b") 'split by word boundary
Dim output() As String = CleanUp(reSplit.Split(inputParsed))
'output = {"Hello", "me", "and", "him"}
End Sub
Private Function CleanUp(output As String()) As String()
Dim outputFiltered As New List(Of String)
For Each v As String In output
If String.IsNullOrWhiteSpace(v) Then Continue For 'remove spaces
If v = "!" Then Continue For 'remove punctuation, feel free to expand
outputFiltered.Add(v)
Next
Return outputFiltered.ToArray
End Function
End Module
To explain the regular expression I used (\(.*\)|{.*}
):
\(
is just a (
, parenthesis is a special symbol in Regex, needs to be escaped with a \
.
.*
means anything, i.e. literally any combination of characters.
|
is a logical OR, so the expression will match either left or ride side of it.
{
does not need escaping, so it just goes as is.
Overall, you can read this as Find anything inside parenthesis or braces, then the code says replace the findings with an empty string, i.e. remove all occurrences. One of the interesting concepts here is understanding greedy vs lazy matching. In this particular case greedy (default) works well, but it's good to know other options.
Useful resources for working with Regex: