0

I want to convert ANSI Escape sequence to IRC color sequence.

So I wrote a regular expression 1 \e\[([\d;]+)?m, however, shell_output_string.replaceFirst ("\\e\\[([\\d;]+)?m", "$1") will return both the matched substring and the rest of non-matched substrings.

Then I wrote regular expression 2 .*\e\[([\d;]+)?m.*, hope it can matches the whole string and replace it with the matched substring, however, replaceFirst (".*\\e\\[([\\d;]+)?m.*", "$1") returns empty string, but matches (".*\\e\\[([\\d;]+)?m.*") is true. What's wrong with this regular expression?

The following question is very similar to this question: Pattern/Matcher group() to obtain substring in Java?

Sample code

import java.util.regex.*;
public class AnsiEscapeToIrcEscape
{
    public static void main (String[] args)
    {
//# grep --color=always bot /etc/passwd
//
//bot:x:1000:1000:bot:/home/bot:/bin/bash
byte[] shell_output_array = {
0x1B, 0x5B, 0x30, 0x31, 0x3B, 0x33, 0x31, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[01;31m^[[K  (#1 - #11)
0x62, 0x6F, 0x74,   // bot  (#12 - #14)
0x1B, 0x5B, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[m^[[K (#15 - #20)
0x3A, 0x78, 0x3A, 0x31, 0x30, 0x30, 0x30, 0x3A, 0x31, 0x30, 0x30, 0x30, 0x3A,   // :x:1000:1000:    (#21 - #33)
0x1B, 0x5B, 0x30, 0x31, 0x3B, 0x33, 0x31, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[01;31m^[[K  (#34 - #44)
0x62, 0x6F, 0x74,   // bot  (#45 - #47)
0x1B, 0x5B, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[m^[[K (#48 - #53)
0x3A, 0x2F, 0x68, 0x6F, 0x6D, 0x65, 0x2F,   // :/home/  (#54 - #60)
0x1B, 0x5B, 0x30, 0x31, 0x3B, 0x33, 0x31, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[01;31m^[[K  (#61 - #71)
0x62, 0x6F, 0x74,   // bot  (#72 - #74)
0x1B, 0x5B, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[m^[[K (#75 - #80)
0x3A, 0x2F, 0x62, 0x69, 0x6E, 0x2F, 0x62, 0x61, 0x73, 0x68, // :/bin/bash   (#81 - #90)
};
        String shell_output = new String (shell_output_array);
        System.out.println (shell_output);
        System.out.println ("total " + shell_output_array.length + " bytes");

        final String CSI_REGEXP = "\\e\\[";
        final String CSI_SGR_REGEXP_First = CSI_REGEXP + "([\\d;]+)?m";
        final String CSI_SGR_REGEXP = ".*" + CSI_SGR_REGEXP_First + ".*";

        System.out.println (shell_output.replaceFirst(CSI_SGR_REGEXP_First, "$1"));
        System.out.println (shell_output.replaceFirst(CSI_SGR_REGEXP, "$1"));
    }
}
Community
  • 1
  • 1
LiuYan 刘研
  • 1,614
  • 1
  • 16
  • 29

2 Answers2

1

Regex's are greedy - that is, each pattern will try to match as much of the input as it can.

This means that when a pattern starts with .* , that part of the pattern will try to cover as much of the input text that it can - so effectively the forcing the remaining part of the pattern to try to find a match starting from the end of the input string working towards the front.

So, what's the first match for the rest of the pattern from the end of the string (or, if you prefer, what's the last substring that matches)? It's on the penultimate line of your input, and consists of just ^[m

That matches because the whole ([\d;]+) part of the pattern is made optional by the following ? .

In turn, this means that, since the final expression has no digits or ;, the $1 group is empty - hence you get empty string output.

At least, that's what I reckon without being near a Java machine to test it. Hope it helps.

racraman
  • 4,988
  • 1
  • 16
  • 16
  • 1
    ah,thanks for the detail explanation. when i wrote this test code, i also tried a simple string `\u001b[01;31m`, and got the right result using the regular expression above. and this "right" result and "wrong" result above are both expected according to your explanation. Thank you very much! – LiuYan 刘研 Oct 21 '13 at 10:45
0
    The API of String's replaceFirst says :


     replaceFirst

    public String replaceFirst(String regex,
                               String replacement)

        Replaces the first substring of this string that matches the given regular expression with the given replacement.

        An invocation of this method of the form str.replaceFirst(regex, repl) yields exactly the same result as the expression

            Pattern.compile(regex).matcher(str).replaceFirst(repl)

        Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string; see Matcher.replaceFirst(java.lang.String). Use Matcher.quoteReplacement(java.lang.String) to suppress the special meaning of these characters, if desired.

        Parameters:
            regex - the regular expression to which this string is to be matched
            replacement - the string to be substituted for the first match 
        Returns:
            The resulting String 
        Throws:
            PatternSyntaxException - if the regular expression's syntax is invalid
        Since:
            1.4
        See Also:
            Pattern



Please read the Note Part which specifies that the \ and $ may cause the result to be different.
You can use Pattern and Matcher instead.

Example  
public class RegexMatches
{
    public static void main( String args[] ){

      // String to be scanned to find the pattern.
     // String line = "This order was placed for QT3000! OK?";
     // String pattern = "(.*)(\\d+)(.*)";

      byte[] shell_output_array = {
              0x1B, 0x5B, 0x30, 0x31, 0x3B, 0x33, 0x31, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[01;31m^[[K  (#1 - #11)
              0x62, 0x6F, 0x74,   // bot  (#12 - #14)
              0x1B, 0x5B, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[m^[[K (#15 - #20)
              0x3A, 0x78, 0x3A, 0x31, 0x30, 0x30, 0x30, 0x3A, 0x31, 0x30, 0x30, 0x30, 0x3A,   // :x:1000:1000:    (#21 - #33)
              0x1B, 0x5B, 0x30, 0x31, 0x3B, 0x33, 0x31, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[01;31m^[[K  (#34 - #44)
              0x62, 0x6F, 0x74,   // bot  (#45 - #47)
              0x1B, 0x5B, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[m^[[K (#48 - #53)
              0x3A, 0x2F, 0x68, 0x6F, 0x6D, 0x65, 0x2F,   // :/home/  (#54 - #60)
              0x1B, 0x5B, 0x30, 0x31, 0x3B, 0x33, 0x31, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[01;31m^[[K  (#61 - #71)
              0x62, 0x6F, 0x74,   // bot  (#72 - #74)
              0x1B, 0x5B, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[m^[[K (#75 - #80)
              0x3A, 0x2F, 0x62, 0x69, 0x6E, 0x2F, 0x62, 0x61, 0x73, 0x68, // :/bin/bash   (#81 - #90)
              };
      String line = new String (shell_output_array);
      //String pattern = "(.*)(\\d+)(.*)";
      final String CSI_REGEXP = "\\e\\[";
      final String CSI_SGR_REGEXP_First = CSI_REGEXP + "([\\d;]+)?m";
      final String CSI_SGR_REGEXP = ".*" + CSI_SGR_REGEXP_First + ".*";

      // Create a Pattern object
      Pattern r = Pattern.compile(CSI_SGR_REGEXP);

      // Now create matcher object.
      Matcher m = r.matcher(line);
      while (m.find()) {
         System.out.println(m.start() + "  " + m.end());
         System.out.println("Found value: " + m.group());
      } 
   }
}
Nishant Lakhara
  • 2,295
  • 4
  • 23
  • 46