10

I am attempting to build a string extension method to trim a string to a certain length but with not breaking a word. I wanted to check to see if there was anything built into the framework or a more clever method than mine. Here's mine so far (not thoroughly tested):

public static string SmartTrim(this string s, int length)
        {
            StringBuilder result = new StringBuilder();

            if (length >= 0)
            {
                if (s.IndexOf(' ') > 0)
                {
                    string[] words = s.Split(' ');
                    int index = 0;

                    while (index < words.Length - 1 && result.Length + words[index + 1].Length <= length)
                    {
                        result.Append(words[index]);
                        result.Append(" ");
                        index++;
                    }

                    if (result.Length > 0)
                    {
                        result.Remove(result.Length - 1, 1);
                    }
                }
                else
                {
                    result.Append(s.Substring(0, length));
                }
            }
            else
            {
                throw new ArgumentOutOfRangeException("length", "Value cannot be negative.");
            }

            return result.ToString();
        }
Mike Cole
  • 14,474
  • 28
  • 114
  • 194
  • 1
    i would not split. i would loop over the string searching for the next word break. stop if the position of the found break is after the given length. otherwise add the word before it to the string builder. to find the word before the found break you will need to store the position of the previously found break (or zero). makes sense? – akonsu Aug 17 '10 at 17:03
  • 1
    You may not care for your application, but keep in mind that the built-in `Trim` functions are actually checking for `char.IsWhiteSpace`, not just `space`. – Marc Aug 17 '10 at 17:11
  • @Marc - good note. I was questioning my wording while typing it. – Mike Cole Aug 17 '10 at 18:25
  • See also http://stackoverflow.com/questions/1613896/truncate-string-on-whole-words-in-net-c – dthrasher Nov 22 '10 at 16:02

7 Answers7

14

I'd use string.LastIndexOf - at least if we only care about spaces. Then there's no need to create any intermediate strings...

As yet untested:

public static string SmartTrim(this string text, int length)
{
    if (text == null)
    {
        throw new ArgumentNullException("text");
    }
    if (length < 0)
    {
        throw new ArgumentOutOfRangeException();
    }
    if (text.Length <= length)
    {
        return text;
    }
    int lastSpaceBeforeMax = text.LastIndexOf(' ', length);
    if (lastSpaceBeforeMax == -1)
    {
        // Perhaps define a strategy here? Could return empty string,
        // or the original
        throw new ArgumentException("Unable to trim word");
    }
    return text.Substring(0, lastSpaceBeforeMax);        
}

Test code:

public class Test
{
    static void Main()
    {
        Console.WriteLine("'{0}'", "foo bar baz".SmartTrim(20));
        Console.WriteLine("'{0}'", "foo bar baz".SmartTrim(3));
        Console.WriteLine("'{0}'", "foo bar baz".SmartTrim(4));
        Console.WriteLine("'{0}'", "foo bar baz".SmartTrim(5));
        Console.WriteLine("'{0}'", "foo bar baz".SmartTrim(7));
    }
}

Results:

'foo bar baz'
'foo'
'foo'
'foo'
'foo bar'
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • So how do you refactor if the requirement is any word break, not just a space? Specifically the most common (where a word could break, but the character not have white-space around it) is the hyphen... Just curious. – AllenG Aug 17 '10 at 17:18
  • 1
    @AllenG: If it's still in a small set, `text.LastIndexOfAny(Delimiters)` would be the best option. – Jon Skeet Aug 17 '10 at 17:31
2

How about a Regex based solution ? You will probably want to test some more, and do some bounds checking; but this is what spring to my mind:

using System;
using System.Text.RegularExpressions;

namespace Stackoverflow.Test
{
    static class Test
    {
        private static readonly Regex regWords = new Regex("\\w+", RegexOptions.Compiled);

        static void Main()
        {
            Console.WriteLine("The quick brown fox jumped over the lazy dog".SmartTrim(8));
            Console.WriteLine("The quick brown fox jumped over the lazy dog".SmartTrim(20));
            Console.WriteLine("Hello, I am attempting to build a string extension method to trim a string to a certain length but with not breaking a word. I wanted to check to see if there was anything built into the framework or a more clever method than mine".SmartTrim(100));
        }

        public static string SmartTrim(this string s, int length)
        {
            var matches = regWords.Matches(s);
            foreach (Match match in matches)
            {
                if (match.Index + match.Length > length)
                {
                    int ln = match.Index + match.Length > s.Length ? s.Length : match.Index + match.Length;
                    return s.Substring(0, ln);
                }
            }
            return s;
        }
    }
}
driis
  • 161,458
  • 45
  • 265
  • 341
2

Try this out. It's null-safe, won't break if length is longer than the string, and involves less string manipulation.

Edit: Per recommendations, I've removed the intermediate string. I'll leave the answer up as it could be useful in cases where exceptions are not wanted.

public static string SmartTrim(this string s, int length)
{
    if(s == null || length < 0 || s.Length <= length)
        return s;

    // Edit a' la Jon Skeet. Removes unnecessary intermediate string. Thanks!
    // string temp = s.Length > length + 1 ? s.Remove(length+1) : s;
    int lastSpace = s.LastIndexOf(' ', length + 1);
    return lastSpace < 0 ? string.Empty : s.Remove(lastSpace);
}
kbrimington
  • 25,142
  • 5
  • 62
  • 74
1

Use like this

var substring = source.GetSubstring(50, new string[] { " ", "." })

This method can get a sub-string based on one or many separator characters

public static string GetSubstring(this string source, int length, params string[] options)
    {
        if (string.IsNullOrWhiteSpace(source))
        {
            return string.Empty;
        }

        if (source.Length <= length)
        {
            return source;
        }

        var indices =
            options.Select(
                separator => source.IndexOf(separator, length, StringComparison.CurrentCultureIgnoreCase))
                .Where(index => index >= 0)
                .ToList();

        if (indices.Count > 0)
        {
            return source.Substring(0, indices.Min());
        }

        return source;
    }
hazjack
  • 1,645
  • 13
  • 27
1
string strTemp = "How are you doing today";
int nLength = 12;
strTemp = strTemp.Substring(0, strTemp.Substring(0, nLength).LastIndexOf(' '));

I think that should do it. When I ran that, it ended up with "How are you".

So your function would be:

public static string SmartTrim(this string s, int length) 
{  
    return s.Substring(0, s.Substring(0, length).LastIndexOf(' '));; 
} 

I would definitely add some exception handling though, such as making sure the integer length is no greater than the string length and not less than 0.

XstreamINsanity
  • 4,176
  • 10
  • 46
  • 59
  • 1
    This will fail in various cases, e.g. if the length is longer than you need, or is one word of exactly the right length, or can't be successfully trimmed. – Jon Skeet Aug 17 '10 at 17:09
  • Yeah, you put that comment as I was making the edit. :) I figured I woudl leave the exception handling to him. – XstreamINsanity Aug 17 '10 at 17:12
1

Obligatory LINQ one liner, if you only care about whitespace as word boundary:

return new String(s.TakeWhile((ch,idx) => (idx < length) || (idx >= length && !Char.IsWhiteSpace(ch))).ToArray());
driis
  • 161,458
  • 45
  • 265
  • 341
0

I'll toss in some Linq goodness even though others have answered this adequately:

public string TrimString(string s, int maxLength)
{
    var pos = s.Select((c, idx) => new { Char = c, Pos = idx })
        .Where(item => char.IsWhiteSpace(item.Char) && item.Pos <= maxLength)
        .Select(item => item.Pos)
        .SingleOrDefault();

    return pos > 0 ? s.Substring(0, pos) : s;
}

I left out the parameter checking that others have merely to accentuate the important code...

joshperry
  • 41,167
  • 16
  • 88
  • 103