3

I have a string as listed below.

string sample = " class0 .calss1 .class2 .class3.class4 .class5 class6 .class7";

I need to create a list of WORDS from this sample string.

A WORD is a string that starts with a period and ends with:

  1. a space or
  2. another period or
  3. end of string

Note: The key point here is - the splitting is based on two criteria - a period and a blank space

I have following program. It works fine. However, is there a simpler/more efficient/concise approach using LINQ or Regular Expressions?

CODE

        List<string> wordsCollection = new List<string>();
        string sample = " class0 .calss1 .class2 .class3.class4  .class5 class6 .class7";

        string word = null;

        int stringLength = sample.Length;
        int currentCount = 0;

        if (stringLength > 0)
        {
            foreach (Char c in sample)
            {

                currentCount++;
                if (String.IsNullOrEmpty(word))
                {
                    if (c == '.')
                    {
                        word = Convert.ToString(c);
                    }
                }
                else
                {

                    if (c == ' ')
                    {
                        //End Criteria Reached
                        word = word + Convert.ToString(c);
                        wordsCollection.Add(word);
                        word = String.Empty;
                    }
                    else if (c == '.')
                    {
                        //End Criteria Reached
                        wordsCollection.Add(word);
                        word = Convert.ToString(c);
                    }
                    else
                    {
                        word = word + Convert.ToString(c);
                        if (stringLength == currentCount)
                        {
                            wordsCollection.Add(word);
                        }
                    }
                }

            }
        }

RESULT

        foreach (string wordItem in wordsCollection)
        {
            Console.WriteLine(wordItem);

        }

enter image description here

Reference:

  1. Splitting up a string, based on predicate
  2. Is there a better way to get sub-sequences where each item matches a predicate?
  3. Linq based generic alternate to Predicate<T>?
Community
  • 1
  • 1
LCJ
  • 22,196
  • 67
  • 260
  • 418

4 Answers4

5

You can do this with a regular expression.

Code

Regex regex = new Regex(@"\.[^ .]+");
var matches = regex.Matches(sample);
string[] result = matches.Cast<Match>().Select(x => x.Value).ToArray();

See it working online: ideone

Result

.calss1
.class2
.class3
.class4
.class5
.class7

Explanation of Regular Expression

\.      Match a dot
[^. ]+  Negative character class - anything apart from space or dot (at least one)

Related

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • 2
    +1 I usually frown on regular expressions for splitting strings, but this is a case where it's the best way to do it. – juharr Dec 21 '12 at 15:25
  • Thanks. Can you explain what you mean by "as few as possible" – LCJ Dec 21 '12 at 15:31
  • 1
    @Lijo: Regular expressions are greedy by default. The `?` modifier makes the `*` [lazy](http://www.regular-expressions.info/repeat.html#lazy). – Mark Byers Dec 21 '12 at 15:32
  • Thanks for the reference too http://www.regular-expressions.info/repeat.html#lazy. It would be great if you can update the answer with these details and link – LCJ Dec 21 '12 at 15:35
  • @Lijo: Thought about it a bit and updated to a simpler regular expression. – Mark Byers Dec 21 '12 at 16:35
2
string sample = " class0 .calss1 .class2 .class3.class4  .class5 class6 .class7";

string[] words = sample.Split(new char[] {'.'}).Skip(1).Select(x=> 
            "." + x.Split(new char[] {' '})[0].Trim()).ToArray();

EDIT missed the list part:

List<string> words = sample.Split(new char[] {'.'}).Skip(1).Select(x=> 
            "." + x.Split(new char[] {' '})[0].Trim()).ToList();
Steve
  • 20,703
  • 5
  • 41
  • 67
0

Do you need to keep the . and the space?

If not you can use:

sample.Split(new char[]{' ', '.'}).ToList();

This will give you a list of strings.

Gaz Winter
  • 2,924
  • 2
  • 25
  • 47
0
string sample = " class0 .calss1 .class2 .class3.class4 .class5 class6 .class7";
sample = Regex.Replace(sample, " ", String.Empty);
string[] arr = sample.Split(new char[] { '.' });
VladL
  • 12,769
  • 10
  • 63
  • 83