-3

Split the sentence into tokens. This can be useful, for example, for a search engine.

There are several rules:

Multiple words in quotation marks must be included in the same token

This "huge test" is pointless => this,huge test,is,pointless

Hyphenated words are also included in the same token. Words written with several hyphens (dashes), or having a hyphen at the beginning or end, are placed in separate tokens.

Suzie Smith-Hopper test--hyphens => Suzie,Smith-Hopper,test,hyphens.

My try:

label.Text = "";
string s = "I like-it 'very very'";
string[] arr = Regex.Split(s, @"(\s)|(')");

foreach (var item in arr)
{
    label.Text += item + ", ";
}

but it doesn't work for me

Alexander
  • 103
  • 3

1 Answers1

0

The following certainly isn't efficient, but it would work:

Step 1. Parse file (or input string), and replace all spaces in phrases surrounded by quotes with some character, like '+'. As for multi-hyphens, replace them with a space (' ').

Step 2. Split by space ' '. Any "tokenable" item will be a token...i think.

Step 3. Go back, and replace all special characters (such as '+', above), and replace with space (' '). each of the item in the array would represent a tokean.