How to use grep/sed/awk, to remove a pattern from beginning of a text file

Question

I have a text file with the following pattern written to it:

TIME[32.468ms]  -(3)-............."TEXT I WANT TO KEEP"

I would like to discard the first part of each line containing

TIME[32.468ms]  -(3)-.............

To test the regular expression I've tried the following:

cat myfile.txt | egrep "^TIME\[.*\]\s\s\-\(3\)\-\.+"

This identifies correctly the lines I want. Now, to delete the pattern I've tried:

cat myfile.txt | sed s/"^TIME\[.*\]\s\s\-\(3\)\-\.+"//

but it just seems to be doing the cat, since it shows the content of the complete file and no substitution happens.

What am I doing wrong?

OS: CentOS 7

Tangentially, [the`cat`s are useless.](https://stackoverflow.com/questions/11710552/useless-use-of-cat) — tripleee, Jul 01 '21 at 08:45

RavinderSingh13 · Accepted Answer · 2021-07-01T09:04:32.570

2

With your shown samples, please try following grep command. Written and tested with GNU grep.

grep -oP '^TIME\[\d+\.\d+ms\]\s+-\(\d+\)-\.+\K.*' Input_file

Explanation: Adding detailed explanation for above code.

^TIME\[          ##Matching string TIME from starting of value here.
\d+\.\d+ms\]     ##Matching digits(1 or more occurrences) followed by dot digits(1 or more occurrences) followed by ms ] here.
\s+-\(\d+\)-\.+  ##Matching spaces91 or more occurrences) followed by - digits(1 or more occurrences) - and 1 or more dots.
\K               ##Using \K option of GNU grep to make sure previous match is found in line but don't consider it in printing, print next matched regex part only.
.*               ##to match till end of the value.

2nd solution: Adding awk program here.

awk 'match($0,/^TIME\[[0-9]+\.[0-9]+ms\][[:space:]]+-\([0-9]+\)-\.+/){print substr($0,RSTART+RLENGTH)}' Input_file

Explanation: using match function of awk, to match regex ^TIME\[[0-9]+\.[0-9]+ms\][[:space:]]+-\([0-9]+\)-\.+ which will catch text which we actually want to remove from lines. Then printing rest of the text apart from matched one which is actually required by OP.

edited Jul 01 '21 at 09:04

answered Jul 01 '21 at 08:38

RavinderSingh13

130,504
14
57
93

`grep -P` is commonly available on Linux, but not standard. The OP have not revealed which OS they are on. – tripleee Jul 01 '21 at 08:53
1

@tripleee, yeah, I had mentioned its GNU `grep` and added an `awk` solution as well in my answer. – RavinderSingh13 Jul 01 '21 at 08:59
Unfortunately, it is still not working. If I have TIME as a first word but, as well, TIM it will get TIM instead. ex TIME and the rest TIM and the rest – Pedro Cardoso Jul 01 '21 at 14:17
@PedroCardoso, both of my codes are working with your shown samples. Is your actual file is same as shown samples? – RavinderSingh13 Jul 01 '21 at 14:19

Carlos Pascual · Answer 2 · 2021-07-01T11:07:27.970

1

This awk using its sub() function:

awk 'sub(/^TIME[[][^]]*].*\.+/,"")' file
"TEXT I WANT TO KEEP"

If there is replacement, sub() returns true.

edited Jul 01 '21 at 11:07

answered Jul 01 '21 at 10:44

Carlos Pascual

1,106
1
5
8

score 1 · Answer 3 · answered Jul 01 '21 at 12:44

1

$ cut -d'"' -f2 file
TEXT I WANT TO KEEP

answered Jul 01 '21 at 12:44

Ed Morton

188,023
17
78
185

score 0 · Answer 4 · answered Jul 01 '21 at 08:43

0

You may use:

s='TIME[32.468ms]  -(3)-............."TEXT I WANT TO KEEP"'
sed -E 's/^TIME\[[^]]*].*\.+//'

"TEXT I WANT TO KEEP"

answered Jul 01 '21 at 08:43

anubhava

761,203
64
569
643

tripleee · Answer 5 · 2021-07-01T09:02:44.247

The \s regex extension may not be supported by your sed.

In BRE syntax (which is what sed speaks out of the box) you do not backslash round parentheses - doing that turns them into regex metacharacters which do not match themselves, somewhat unintuitively. Also, + is just a regular character in BRE, not a repetition operator (though you can turn it into one by similarly backslashing it: \+).

You can try adding an -E option to switch from BRE syntax to the perhaps more familiar ERE syntax, but that still won't enable Perl regex extensions, which are not part of ERE syntax, either.

sed 's/^TIME\[[^][]*\][[:space:]][[:space:]]-(3)-\.*//' myfile.txt

should work on any reasonably POSIX sed. (Notice also how the minus character does not need to be backslash-escaped, though doing so is harmless per se. Furthermore, I tightened up the regex for the square brackets, to prevent the "match anything" regex you had .* from "escaping" past the closing square bracket. In some more detail, [^][] is a negated character class which matches any character which isn't (a newline or) ] or [; they have to be specified exactly in this order to avoid ambiguity in the character class definition. Finally, notice also how the entire sed script should normally be quoted in single quotes, unless you have specific reasons to use different quoting.)

If you have sed -E or sed -r you can use + instead of * but then this complicates the overall regex, so I won't suggest that here.

score 0 · Answer 6 · answered Jul 01 '21 at 13:50

0

A simpler one for sed:

sed 's/^[^"]*//' myfile.txt

answered Jul 01 '21 at 13:50

Darkman

2,941
2
9
14

score 0 · Answer 7 · answered Jul 02 '21 at 05:42

If the "text you want to keep" always surrounded by the quote like this and only them having the quote in the line starting with "TIME...", then:

sed -n '/^TIME/p' file | awk -F'"' '{print $2}'

should get the line starting with "TIME..." and print the text within the quotes.

Pedro Cardoso · Answer 8 · 2021-07-02T11:29:37.333

0

Thanks all, for your help. By the end, I've found a way to make it work:

echo 'TIME[32.468ms] -(3)-.............TEXT I WANT TO KEEP' | grep TIME | sed -r 's/^TIME\[[0-9]+\.[0-9]+ms\]\s\s-\(3\)-\.+//'

More generally,

grep TIME myfile.txt | sed -r ‘s/^TIME\[[0-9]+\.[0-9]+ms\]\s\s-\(3\)-\.+//’ Cheers, Pedro

edited Jul 02 '21 at 11:29

answered Jul 02 '21 at 09:09

Pedro Cardoso

89
9

How to use grep/sed/awk, to remove a pattern from beginning of a text file

8 Answers8