1

I have a string x that looks like this. The lines with a plus in front are color coded.

diff --git js/js.tests/test/org/jetbrains/kotlin/js/test/JsLineNumberTestGenerated.java js/js.tests/test/org/jetbrains/kotlin/js/test/JsLineNumberTestGenerated.java
index 55597bf..9115830 100644
--- js/js.tests/test/org/jetbrains/kotlin/js/test/JsLineNumberTestGenerated.java
+++ js/js.tests/test/org/jetbrains/kotlin/js/test/JsLineNumberTestGenerated.java
@@ -38,0 +39,6 @@ public class JsLineNumberTestGenerated extends AbstractJsLineNumberTest {
+    @TestMetadata("chainedCall.kt")
+    public void testChainedCall() throws Exception {
+        String fileName = KotlinTestUtils.navigationMetadata("js/js.translator/testData/lineNumbers/chainedCall.kt");
+        doTest(fileName);
+    }
+
@@ -92,0 +99,6 @@ public class JsLineNumberTestGenerated extends AbstractJsLineNumberTest {
+    @TestMetadata("longLiteral.kt")
+    public void testLongLiteral() throws Exception {
+        String fileName = KotlinTestUtils.navigationMetadata("js/js.translator/testData/lineNumbers/longLiteral.kt");
+        doTest(fileName);
+    }
+

I want to extract the green lines, so that in the end I have two strings (an array of strings) like this:

    @TestMetadata("chainedCall.kt")
    public void testChainedCall() throws Exception {
        String fileName = KotlinTestUtils.navigationMetadata("js/js.translator/testData/lineNumbers/chainedCall.kt");
        doTest(fileName);
    }

and

    @TestMetadata("longLiteral.kt")
    public void testLongLiteral() throws Exception {
        String fileName = KotlinTestUtils.navigationMetadata("js/js.translator/testData/lineNumbers/longLiteral.kt");
        doTest(fileName);
    }

The raw output of git diff is this (where you can also see the color code):

'\x1b[1mdiff --git js/js.tests/test/org/jetbrains/kotlin/js/test/JsLineNumberTestGenerated.java js/js.tests/test/org/jetbrains/kotlin/js/test/JsLineNumberTestGenerated.java\x1b[m\n\x1b[1mindex 55597bf..9115830 100644\x1b[m\n\x1b[1m--- js/js.tests/test/org/jetbrains/kotlin/js/test/JsLineNumberTestGenerated.java\x1b[m\n\x1b[1m+++ js/js.tests/test/org/jetbrains/kotlin/js/test/JsLineNumberTestGenerated.java\x1b[m\n\x1b[36m@@ -38,0 +39,6 @@\x1b[m \x1b[mpublic class JsLineNumberTestGenerated extends AbstractJsLineNumberTest {\x1b[m\n\x1b[32m+\x1b[m\x1b[32m    @TestMetadata("chainedCall.kt")\x1b[m\n\x1b[32m+\x1b[m\x1b[32m    public void testChainedCall() throws Exception {\x1b[m\n\x1b[32m+\x1b[m\x1b[32m        String fileName = KotlinTestUtils.navigationMetadata("js/js.translator/testData/lineNumbers/chainedCall.kt");\x1b[m\n\x1b[32m+\x1b[m\x1b[32m        doTest(fileName);\x1b[m\n\x1b[32m+\x1b[m\x1b[32m    }\x1b[m\n\x1b[32m+\x1b[m\n\x1b[36m@@ -92,0 +99,6 @@\x1b[m \x1b[mpublic class JsLineNumberTestGenerated extends AbstractJsLineNumberTest {\x1b[m\n\x1b[32m+\x1b[m\x1b[32m    @TestMetadata("longLiteral.kt")\x1b[m\n\x1b[32m+\x1b[m\x1b[32m    public void testLongLiteral() throws Exception {\x1b[m\n\x1b[32m+\x1b[m\x1b[32m        String fileName = KotlinTestUtils.navigationMetadata("js/js.translator/testData/lineNumbers/longLiteral.kt");\x1b[m\n\x1b[32m+\x1b[m\x1b[32m        doTest(fileName);\x1b[m\n\x1b[32m+\x1b[m\x1b[32m    }\x1b[m\n\x1b[32m+\x1b[m'

There I found a regex which actually can match for the green color, but I have problems with applying it to my concrete problem:

/^\e\[32m\+\e\[m\e\[32m(.*)\e\[m$/

ScientiaEtVeritas
  • 5,158
  • 4
  • 41
  • 59
  • Did you consider to grep for the lines starting with a `+` instead? The point is that colors are discouraged when processing ASCII output of unix commands; they only make sense on a screen. So many programs stop producing the color codes as soon as they notice that their output is passed into another command. – Alfe Jun 02 '17 at 09:22
  • The accepted answer in the linked question goes for the color code, because there could be other lines with a plus in front. In addition I'm working in Python with GitPython, so there is no grep available. – ScientiaEtVeritas Jun 02 '17 at 09:24
  • I didn't mean the Unix command `grep`, I was referring to the act of grepping (*g*etting something which matches a *re*gular ex*p*ression). – Alfe Jun 02 '17 at 09:26

1 Answers1

1

You can just test if the sequence is in the line:

for line in x.split('\n'):
    if '\x1b[32m' in line:
        print line

If you really need to find groups of lines which all contain a green-esc-sequence, you can do it like this:

import re

for chunk in re.findall(r'((?:[^\n]*\x1b\[32m[^\n]*\n)+)', x):
    print chunk

But I think searching for the escape-sequence is a hack. git determines on the base of the terminal in use how to make something green etc. If you go to a different terminal you might end up with different escape sequences. Also, git might decide not to print color codes in case it notices its output isn't going to a terminal but to another process or a file.

I found no simple alternative, though, no tweaking git to print out added lines in a special format or similar. So the only truly clean way of doing it would be to parse the git output completely (like patch would do), i. e. take the line numbers it says into account and ignore the formatting. But that's not possible with a mere pattern matching.

Alfe
  • 56,346
  • 20
  • 107
  • 159
  • Let me note that one can force git to output color codes with parameter ``--color=always``. – ScientiaEtVeritas Jun 02 '17 at 10:36
  • But it is not documented which color codes `git` will use. It might be depending on the current terminal. Version 2.7.4 on Linux, however, seems to be agnostic and use always the same color codes, regardless of the terminal. I wouldn't bet that a later version or a Windows version (or something even more bizarr) would always do this. Color codes are simply something not meant to be searched for. Doing it nevertheless is a hack. – Alfe Jun 02 '17 at 11:01