Grepping( using a regular expression ) for a newline character within square brackets [ ]

Question

I want to find all instance of the minus sign followed by a one digit number in a line of text. The numbers are seperated by comma's but can also be at the end of the line

text.txt contains

hx,-7,u,-9,u,-8

(There is a newline at the end)

I'm running this command

egrep -o ',[\-][0-9][\n,]' text.txt

And get

,-7,
,-9,

But I want to get

,-7,
,-9,
,-8

Edit: Something like

hx,-7,u/-9,u,-8

Should still produce

,-7,
,-9,
,-8

And using

egrep -o ',[\-](1|2)?[0-9XY][\,$]' text.txt

Doesn't work for it

In `grep` it is sufficient to use `$` anchor for end of line. Text files in *nix use `\n` to terminate a line. Try `egrep -o ',[\-][0-9](,|$)' text.txt`. — alvits, Jul 28 '16 at 00:21
@alvits thanks, this isn't in my question but there actually more options than just the "," or the \n the number can also be succeeded by a "/" character. So I need something like egrep -o ',[\-](1|2)?[0-9XY][\/,$]' text.txt but this doesn't work with the "$" — Sam, Jul 28 '16 at 19:25
You have to be comprehensive with your post. We give solution for what you ask but we certainly can't predict nor read what you have in mind. For your new requirement, which I hope is complete, you can use `egrep -o ',[\-][0-9](\/|,|$)' text.txt`. I hope you will not comeback and say the number can also be followed by something else. The last regex `(\/|,|$)` means any of these 3 `/`, `,`, `$` is expected after the number. If you have more, simply add them to the list. `|` is the `or` meaning exactly one of them. — alvits, Jul 28 '16 at 20:09
If you have more than a handful of characters following the number you can use `([\/,]|$)` for the last regex. It means one of these characters enclosed in `[]` or `|` the end of line `$` should follow the number. — alvits, Jul 28 '16 at 20:16
@alvits I didn't think it was going to be an issue, I took the other stuff out the make my example more to the point so people wouldn't get confused — Sam, Jul 28 '16 at 20:18
@alvits Hey so one more question, when I run the command in the line 35,yt,-2,-3,wd,-7,-12,-13,-14,-15,-16,-17,-19,-20 it doesn't catch 3,12,14,16 and 19 (every second one is getting skipped if one preceeds it) — Sam, Jul 28 '16 at 20:25
Your original regex which I inherited is expecting a single digit. You have a bigger problem. What's the rule why 13, 15, 17, and 20 shouldn't make it to the output? Is it based on the order and some order should be ignored? I suggest you write a comprehensive post regarding your need instead of coming back and forth. — alvits, Jul 28 '16 at 20:40
@alvits Ok I posted a new question http://stackoverflow.com/questions/38646372/grepping-for-overlapping-pattern-matches Because it seemed like a new concept. Thank-you — Sam, Jul 28 '16 at 20:44

score 1 · Answer 1 · edited May 23 '17 at 11:44

1

grep works on a per-line basis and new-line characters are not matched against since they are treated as the delimiter for each line.

How to give a pattern for new line in grep? indicates that pcregrep can perform multiline grep operations.

Alternatively, you can use tr to translate the \n characters to , i.e.:

cat text.txt | tr '\n' ',' | egrep -o ',[\-][0-9][\n,]'

yields:

,-7,
,-9,
,-8,

edited May 23 '17 at 11:44

Community

1
1

answered Jul 28 '16 at 00:09

theorifice

670
3
9

The OP isn't looking for multiline matching `grep` but could easily be misread. However, your solution is a bad idea. To begin with, you are concatenating all lines of a file. Imagine if the input file is a log. `grep`, as you pointed out, is line oriented. Imagine the long line it has to parse after all the lines have been concatenated into a single line. – alvits Jul 28 '16 at 00:34
Agreed. The original statement problem statement just indicated that the text file contained one line. Your solution with anchoring is much preferred. – theorifice Jul 28 '16 at 00:36

Grepping( using a regular expression ) for a newline character within square brackets [ ]

1 Answers1