0

I have some lines of code I am trying to remove some leading text from which appears like so:

Line 1: myApp.name;
Line 2: myApp.version
Line 3: myApp.defaults, myApp.numbers;

I am trying and trying to find a regex that will remove anything up to (but excluding) myApp.

I have tried various regular expressions, but they all seem to fail when it comes to line 3 (because myApp appears twice).

The closest I have come so far is:

.*?myApp

Pretty simple - but that matches both instances of myApp occurrences in Line 3 - whereas I'd like it to match only the first.

There's a few hundred lines - otherwise I'd have deleted them all manually by now.

Can somebody help me? Thanks.

keldar
  • 6,152
  • 10
  • 52
  • 82

2 Answers2

2

You need to add an anchor ^ which matches the starting point of a line ,

^.*?(myApp)

DEMO

Use the above regex and replace the matched characters with $1 or \1. So that you could get the string myApp in the final result after replacement.

Pattern explanation:

  • ^ Start of a line.
  • .*?(myApp) Shortest possible match upto the first myApp. The string myApp was captured and stored into a group.(group 1)
  • All matched characters are replaced with the chars present inside the group 1.
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • Thanks! Can you explain why the carat makes all the difference? I need to brush up on my regex skills. – keldar Sep 13 '14 at 17:22
  • I understand it signifies the beginning of a line, but cannot understand how it helps. – keldar Sep 13 '14 at 17:23
1

Your regular expression works in Perl if you add the ^ to ensure that you only match the beginnings of lines:

cat /tmp/test.txt  | perl -pe 's/^.*?myApp/myApp/g'
myApp.name;
myApp.version
myApp.defaults, myApp.numbers;

If you wanted to get fancy, you could put the "myApp" into a group that doesn't get captured as part of the expression using (?=) syntax. That way it doesn't have to be replaced back in.

cat /tmp/test.txt  | perl -pe 's/^.*?(?=myApp)//g'
myApp.name;
myApp.version
myApp.defaults, myApp.numbers;
Stephen Ostermiller
  • 23,933
  • 14
  • 88
  • 109
  • Thanks - why does the carat make all the difference? That was all that was missing from mine and I am struggling to see how it helps. – keldar Sep 13 '14 at 17:29
  • It means to match from the beginning of the string. Without it, there are two matches in the third line: `Line 3: myApp` and `.defaults, myApp`, both of which get removed. Only one of them starts the line, so the caret helps. – Stephen Ostermiller Sep 13 '14 at 17:52