3

I'm trying to use Regex.Replace in order to insert the \n\t characters at every nth position in the string. The problem is I don't want it inserted in the middle of a word.

What I have now:

Regex.Replace(inputString, "(.{85})", "$&\n\t")

I would also like to only insert a tab if there is already a newline present in the group of 85 characters (inserting the tab directly after the already present newline).

  • Are you constrained to using regex? This sounds like you're trying to do a few things that would be fairly easy to manage in a for loop. – Jimmy Dec 22 '15 at 20:03
  • Not necessarily but it seems much cleaner to use a regex one-liner than have to iterate through the string using a for loop....although it may come to that lol –  Dec 22 '15 at 20:44
  • Could you clarify *if there is already a newline present in the group of 85 characters*? If the 85th character is a newline, add a tab only, or if the 85th symbol is not a newline, add both a newline and a tab? Have a look at **[this demo](http://ideone.com/TFzQL1)**. – Wiktor Stribiżew Dec 22 '15 at 21:12
  • This is _almost_ word wrap. I can give you a regex for Notepad style word wrap. You just set the column width. Btw, word wrap is not too easy. –  Dec 22 '15 at 23:09
  • @stribizhev there may or may not be a newline present in the group of 85 characters –  Dec 28 '15 at 16:07
  • @sln that is basically what I am trying to go for...a word wrap but with every newline also inserting a tab.... –  Dec 28 '15 at 16:08

3 Answers3

1

I assume you want to add a tab only if the 85th character is a newline, or if the 85th symbol is not a newline, add both a newline and a tab.

Then, you can use @"(?s).{0,85}" regex that will match any symbols from 0 to 85, as many as possible (it is a greedy quantifier) and check if the last character is a newline or not. Then, do what is necessary:

var str = "This is a really long test string is this is this is this is this is this is this is thisit is this this has to be way more than 85 characters ssssssssssss";
Console.WriteLine(Regex.Replace(str, @"(?s).{0,85}", 
        m => m.Value.EndsWith("\n") ? m.Value + "\t" : m.Value + "\n\t"));

Result of the demo:

This is a really long test string is this is this is this is this is this is this is 
    thisit is this this has to be way more than 85 characters ssssssssssss

If you need to only add a tab if there the 85-character match contains a newline, replace the .EndsWith("\n") with .Contains("\n") in the above code.

To avoid splitting in the middle of a word, add a word boundary: @"(?s).{0,85}\b". Or, if it is not always a word character at the end, use @"(?s).{0,85}(?!\w)". Another possible scenario is when you want to ensure at least 85 characters (or a bit more if the word boundary is not found), use @"(?s).{85,}?(?!\w)".

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • I think this is pretty much exactly what I am going for...my only concern is if it will add a newline in the middle of a word...I still want to keep a word wrap type of functionality –  Dec 28 '15 at 16:12
  • 1
    To avoid splitting in the middle of a word, add a word boundary: `@"(?s).{0,85}\b"`. Or, if it is not always a word character at the end, use `@"(?s).{0,85}(?!\w)"`. Another possible scenario is when you want to ensure at least 85 characters (or a bit more if the word boundary is not found), use `@"(?s).{85,}?(?!\w)"`. – Wiktor Stribiżew Dec 28 '15 at 16:12
  • I guess the problem with the above solution is that the tab or newline will be always after the match. If you can provide an example string with expected output (in the question as newlines in commentsa are not supported), I could help more. – Wiktor Stribiżew Dec 28 '15 at 17:13
  • i believe the word boundary addition works the way i need, thanks! –  Dec 28 '15 at 21:44
  • How to preserve newlines but reset the count back to zero after encountering one (i.e. just wrap long lines on word boundaries)? – crokusek Feb 06 '20 at 19:16
  • @crokusek Sorry, let's go to a [chat](https://chat.stackoverflow.com/rooms/207377/room-for-wiktor-stribizew-and-crokusek), I do not understand what your problem is. – Wiktor Stribiżew Feb 06 '20 at 19:27
0

With some chat help from @Wiktor Stribiżew, here is an alternative:

  • preserves pre-existing newlines
  • if a line is longer than N chars, it breaks the line at whitespace and indents the next line(s) until the long line is consumed.
  • configurable max line length
  • configurable how/whether the broken lines whould be indented.
public static string AddLineBreaks(this string text, int maxLineLength, string indent = "\t")
{        
    // Strip off any whitespace (including \r) before each pre-existing end of line character.
    //
    text = Regex.Replace(text, @"\s+\n", "\n", RegexOptions.Multiline);

    // Matches that are too long include a trailing whitespace character (excluding newline)
    // which is then used to sense that an indent should occur
    // Regex to match whitespace except newline: https://stackoverflow.com/a/3469155/538763
    //
    string regex = @"(\n)|([^\n]{0," + maxLineLength + @"}(?!\S)[^\S\n]?)";

    return Regex.Replace(text, regex, m => m.Value +
        (m.Value.Length > 1 && Char.IsWhiteSpace(m.Value[m.Value.Length - 1]) ? ("\n" + indent) : ""));
}

enter image description here

crokusek
  • 5,345
  • 3
  • 43
  • 61
0

Try backtick instead of \

Regex.Replace(inputString, "(.{85})", "$&`n`t")

Sometimes $ also needs backtick

Eddie
  • 83
  • 6