0

I want to find all instance of the minus sign followed by a one digit number in a line of text. The numbers are seperated by comma's but can also be at the end of the line

text.txt contains

hx,-7,u,-9,u,-8

(There is a newline at the end)

I'm running this command

egrep -o ',[\-][0-9][\n,]' text.txt 

And get

,-7,
,-9,

But I want to get

,-7,
,-9,
,-8

Edit: Something like

hx,-7,u/-9,u,-8

Should still produce

,-7,
,-9,
,-8

And using

egrep -o ',[\-](1|2)?[0-9XY][\,$]' text.txt

Doesn't work for it

Sam
  • 1,765
  • 11
  • 82
  • 176
  • You could change your last character class to `[^0-9]` – rgoliveira Jul 28 '16 at 00:14
  • 1
    In `grep` it is sufficient to use `$` anchor for end of line. Text files in *nix use `\n` to terminate a line. Try `egrep -o ',[\-][0-9](,|$)' text.txt`. – alvits Jul 28 '16 at 00:21
  • @alvits thanks, this isn't in my question but there actually more options than just the "," or the \n the number can also be succeeded by a "/" character. So I need something like egrep -o ',[\-](1|2)?[0-9XY][\/,$]' text.txt but this doesn't work with the "$" – Sam Jul 28 '16 at 19:25
  • You have to be comprehensive with your post. We give solution for what you ask but we certainly can't predict nor read what you have in mind. For your new requirement, which I hope is complete, you can use `egrep -o ',[\-][0-9](\/|,|$)' text.txt`. I hope you will not comeback and say the number can also be followed by something else. The last regex `(\/|,|$)` means any of these 3 `/`, `,`, `$` is expected after the number. If you have more, simply add them to the list. `|` is the `or` meaning exactly one of them. – alvits Jul 28 '16 at 20:09
  • 1
    If you have more than a handful of characters following the number you can use `([\/,]|$)` for the last regex. It means one of these characters enclosed in `[]` or `|` the end of line `$` should follow the number. – alvits Jul 28 '16 at 20:16
  • @alvits I didn't think it was going to be an issue, I took the other stuff out the make my example more to the point so people wouldn't get confused – Sam Jul 28 '16 at 20:18
  • @alvits Hey so one more question, when I run the command in the line 35,yt,-2,-3,wd,-7,-12,-13,-14,-15,-16,-17,-19,-20 it doesn't catch 3,12,14,16 and 19 (every second one is getting skipped if one preceeds it) – Sam Jul 28 '16 at 20:25
  • Your original regex which I inherited is expecting a single digit. You have a bigger problem. What's the rule why 13, 15, 17, and 20 shouldn't make it to the output? Is it based on the order and some order should be ignored? I suggest you write a comprehensive post regarding your need instead of coming back and forth. – alvits Jul 28 '16 at 20:40
  • @alvits Ok I posted a new question http://stackoverflow.com/questions/38646372/grepping-for-overlapping-pattern-matches Because it seemed like a new concept. Thank-you – Sam Jul 28 '16 at 20:44

1 Answers1

1

grep works on a per-line basis and new-line characters are not matched against since they are treated as the delimiter for each line.

How to give a pattern for new line in grep? indicates that pcregrep can perform multiline grep operations.

Alternatively, you can use tr to translate the \n characters to , i.e.:

cat text.txt | tr '\n' ',' | egrep -o ',[\-][0-9][\n,]'

yields:

,-7,
,-9,
,-8,
Community
  • 1
  • 1
theorifice
  • 670
  • 3
  • 9
  • The OP isn't looking for multiline matching `grep` but could easily be misread. However, your solution is a bad idea. To begin with, you are concatenating all lines of a file. Imagine if the input file is a log. `grep`, as you pointed out, is line oriented. Imagine the long line it has to parse after all the lines have been concatenated into a single line. – alvits Jul 28 '16 at 00:34
  • Agreed. The original statement problem statement just indicated that the text file contained one line. Your solution with anchoring is much preferred. – theorifice Jul 28 '16 at 00:36