1

I have a file that looks like this:

5.3.236.113681.2225191122.986.3705653211.104    4
5.3.236.113681.2225191122.986.3705653211.104.3402  45
5.3.236.0.1.20549687.20.93.9.2.234266672113.4455  2
5.3.236.113681.5829104.986.3705653211.119    8
5.3.236.2.01107.50.01.24.48685.30000018053113560818700000112 172

A basic grep will show these results; it shows an additional match which I do not want.

$ grep 5.3.236.113681.2225191122.986.3705653211.104 test.txt
5.3.236.113681.2225191122.986.3705653211.104    4
5.3.236.113681.2225191122.986.3705653211.104.3402  45

I tried greping for a "fixed string"; it shows an additional match which I do not want.

$ grep -F 5.3.236.113681.2225191122.986.3705653211.104 test.txt
5.3.236.113681.2225191122.986.3705653211.104    4
5.3.236.113681.2225191122.986.3705653211.104.3402  45

I tried greping for just the match; it shows an additional match which I do not want.

$ grep -w 5.3.236.113681.2225191122.986.3705653211.104 test.txt
5.3.236.113681.2225191122.986.3705653211.104    4
5.3.236.113681.2225191122.986.3705653211.104.3402  45

This works, but it looks like it's technically greping for the string I want plus the space, which seems more like a workaround than actually targeting specifically what I want.

$ grep "5.3.236.113681.2225191122.986.3705653211.104[[:space:]]" test.txt
5.3.236.113681.2225191122.986.3705653211.104    4

The problem with the one that worked is the desired string may not have space at the end, it may have the space at the front like this:

4   5.3.236.113681.2225191122.986.3705653211.104
45  5.3.236.113681.2225191122.986.3705653211.104.3402

The command that worked previously won't work on a list formatted a little differently.

I could simply write grep "[[:space:]]5.3.236.113681.2225191122.986.3705653211.104 but I don't want to have to re-write the grep for each little difference like that.

I would like to be able to grep for that string and show the whole line, regardless of how or where that line shows up in the text.

anubhava
  • 761,203
  • 64
  • 569
  • 643
Shak
  • 35
  • 3
  • `[.]` is a character class containing only `.` That said, it doesn't sound to me like your problem is really all that specific to periods -- you're worried about the space at the end, not about `.` being a one-character wildcard. So why does the title talk about periods at all? – Charles Duffy Apr 02 '21 at 15:39
  • Searching for `([[:space:]]|^)` at the beginning and `([[:space:]]|$)` at the end is not really all that unusual as a practice. The bigger problem you have is one you haven't asked about at all -- the fact that `1.2.3` also matches `10243`, on account of `.` being a wildcard. – Charles Duffy Apr 02 '21 at 15:43
  • 1
    BTW, this shouldn't be tagged `bash` -- grep is not part of bash, it's an external command that can be used from any shell or with no shell at all. – Charles Duffy Apr 02 '21 at 15:44
  • @CharlesDuffy I don't know how the title got that way, it's not the title I initially used. Edited. I went to remove the bash tag but it was already gone. Sorry, didn't know. – Shak Apr 02 '21 at 16:23
  • The new title isn't great because "unwanted results" doesn't describe enough about _why_ something is unwanted for it to be understood what it means without clicking through and reading the body. I'll edit it further. – Charles Duffy Apr 02 '21 at 16:29
  • 1
    To be clear, the title I objected to was the _original_ one, *grep for variable whose string contains periods*. By the time I commented, I had already fixed it to be something more clear. Similarly, I fixed the bash tag myself, and was commenting only to describe _why_ that change had been made. – Charles Duffy Apr 02 '21 at 16:30

1 Answers1

1

Assuming this is your input file:

cat file

5.3.236.113681.2225191122.986.3705653211.104    4
5.3.236.113681.2225191122.986.3705653211.104.3402  45
5.3.236.0.1.20549687.20.93.9.2.234266672113.4455  2
5.3.236.113681.5829104.986.3705653211.119    8
5.3.236.2.01107.50.01.24.48685.30000018053113560818700000112 172
4   5.3.236.113681.2225191122.986.3705653211.104
45  5.3.236.113681.2225191122.986.3705653211.104.3402

If you have gnu-grep then you can use this PCRE regex with look-arounds:

grep -P '(?<!\S)5\.3\.236\.113681\.2225191122\.986\.3705653211\.104(?!\S)' file

5.3.236.113681.2225191122.986.3705653211.104    4
4   5.3.236.113681.2225191122.986.3705653211.104

Here:

  • (?<!\S): is a negative lookbehind regex to assert that we don't have a non-whitespace at a position before the current position
  • (?!\S): is a negative lookahead regex to assert that we don't have a non-whitespace at a position after the current position

Here is POSIX complaint awk solution:

awk -v s='5.3.236.113681.2225191122.986.3705653211.104' '{
for (i=1; i<=NF; ++i) if ($i == s) {print; next}}' file

5.3.236.113681.2225191122.986.3705653211.104    4
4   5.3.236.113681.2225191122.986.3705653211.104
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • The POSIX compliant awk solution works best for me right now because I can use a variable with that string like `awk -v s='$FUID' '{for (i=1; i<=NF; ++i) if ($i == s) {print; next}}' file`. Can a variable with that string be used with the `gnu-grep` solution? – Shak Apr 02 '21 at 17:15
  • awk would be best bet because we will have to escape all regex meta characters in `grep` regex like `.`, `(`, `),` `[`, `]` etc. If you want to use a shell variable in `gnu-grep` then escape all these characters to make it regex compliant string. – anubhava Apr 02 '21 at 17:17