-1

I want to do strict matching on a text file so that it only returns the patterns I have anded. So for example in a file:

xyz
xy
yx
zyx

I want to run a command similar to:

awk '/x/ && /y/' filename.txt

and I would like it to return only the lines.

yx
xy

and ignore the others because although they do contain an x and a y, they also have a z so they are ignored.

Is this possible in awk?

Rick Dearman
  • 356
  • 2
  • 12
  • 2
    what about lines `x` or `y` or `xx` or `xyx` etc? – Sundeep Nov 12 '22 at 14:00
  • Yes, match those also. – Rick Dearman Nov 12 '22 at 14:12
  • 1
    If you want to match lines that contain only x or only y, add those to your example. Your subject says `awk match ONLY X and Y` but your comment makes it sound like you actually want `awk match ONLY X or Y`. Also, if x and/or y could be multi-character strings then that would make this a completely different (and probably mush harder to solve) problem requiring a potentially different solution so make sure to use multi-char strings in your question if thats what you really have. – Ed Morton Nov 12 '22 at 15:21
  • Don't use the word `pattern` to describe your pattern matching requirements as it's ambiguous. Use the words character or string or regexp - whichever you mean. See [how-do-i-find-the-text-that-matches-a-pattern](https://stackoverflow.com/questions/65621325/how-do-i-find-the-text-that-matches-a-pattern) for more info on that. Only using the word `pattern` in a pattern matching question is like not telling the salesman if you want a car, motorbike, truck, or van when you visit the dealership. – Ed Morton Nov 12 '22 at 15:23
  • 1
    Also clarify if you're looking for a solution for ONLY 2 characters/strings or a general one for N of them as the latter would rule out some solutions and require others. None of the currently posted solutions would scale well for N strings, and most wouldn't even work for 2 strings. – Ed Morton Nov 12 '22 at 15:26

5 Answers5

1

This /x/ && /y/ matches when there is an x and Y present.

Edit:

To allow the same chars in the whole string, you can use a repeated character class and assert the start and end of the string:

awk '/^[xy]+$/' file

If you also want to allow matching spaces, uppercase X and Y and do not want to match empty lines:

awk '/^[[:space:]]*[xyXY][[:space:]xyXY]*$/' file

The pattern matches:

  • ^ Start of string
  • [[:space:]]* Match optional spaces
  • [xyXY] Match a single char x y X Y
  • [[:space:]xyXY]* Match optional spaces or x y X Y
  • $ End of string
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • That doesn't do what I need. I want to do a match when the arguments are present, but not anything else. The x&y are simply examples. It could be /b/ &&/a/ && /t/ I will not know in advance what order they are in, just that it needs to be ONLY those characters in any order. – Rick Dearman Nov 12 '22 at 14:01
  • @RickDearman You only have x and y in the examples and in the title of the post, so that is the only thing we have a go with. If they are single characters `awk '/^[xy]+$/' file` and like `awk '/^[bat]+$/' file` – The fourth bird Nov 12 '22 at 14:04
  • @RickDearman Do you also want a match for just `a` or `ttttt` – The fourth bird Nov 12 '22 at 14:06
  • Sorry my example wasn't the best. Yes, I would like to match multiple instances of the same character. – Rick Dearman Nov 12 '22 at 14:11
  • @RickDearman Ok, I have added an update. – The fourth bird Nov 12 '22 at 14:13
  • 3
    Can also use `grep -xE '[xy]+'` – Sundeep Nov 12 '22 at 15:05
1

I'd just keep it clear and simple, e.g. depending on your requirements for matching lines that only contain x or only contain y which you didn't include in your example:

$ awk '/^[xy]+$/' file
xy
yx

or:

$ awk '/x/ && /y/ && !/[^xy]/' file
xy
yx
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
0

This awk solution applies the condition on the main block to process only lines containing 'x' and 'y' using /x/&&/y/.

Inside the action block the record $0 is assigned to a variable named temp which then has the 'x' and 'y' occurrences removed using gsub(/[xy]/, "",temp). A conditional block then determines the length of temp after the substitution: if the length is 0, the line could only have contained 'x' and 'y' characters, so the line is printed.

awk '/x/&&/y/ { temp=$0; gsub(/[xy]/, "",temp); if (length(temp)==0){print $0}}' input.txt

tested with input.txt file:

xyz
xy
yx
zyx
y
x
xxy
yyx

result:

xy
yx
xxy
yyx
Dave Pritlove
  • 2,601
  • 3
  • 15
  • 14
0

Assumptions:

  • user provides a list of characters to match on (x and y in the provided example)
  • lines of interest are those that contain only said characters (plus white space)
  • matches should be case insensitive, ie, x will match on both x and X
  • blank/empty lines, and lines with only white space, are to be ignored

Adding more lines to the sample input:

$ cat filename.txt
xyz
xy
yx
zyx
---------
xxx
abc def xy
Xy xY XY
z x yy z
x; y; X; Y:
xxyYxy  XXyxyy  yx        # tab delimited
                          # 1 space
                          # blank/empty line

NOTE: comments added for clarification; file does not contain any comments

One awk idea:

awk -v chars='xY' '                                   # provide list of characters (in the form of a string) to match on
BEGIN { regex="[" tolower(chars) "]" }                # build regex of lowercase characters, eg: "[xy]"
      { line=tolower($0)                              # make copy of all lowercase line
        gsub(/[[:space:]]/,"",line)                   # remove all white space
        if (length(line) == 0)                        # if length of line==0 (blank/empty lines, lines with only white space) then ...
           next                                       # skip to next line of input
        gsub(regex,"",line)                           # remove all characters matching regex
        if (length(line) == 0)                        # if length of line == 0 (ie, no other characters) then ...
           print $0                                   # print current line to stdout
      }
' filename.txt

This generates:

xy
yx
xxx
Xy xY XY
xxyYxy  XXyxyy  yx

NOTE: the last 2 input lines (1 space, blank/empty) are ignored

markp-fuso
  • 28,790
  • 4
  • 16
  • 36
0

You can treat the strings as a set of characters and do a set equality on the two strings.

awk -v set='xy' '

function cmp(s1, s2) {
    # turns s1 and s2 into associative arrays to do a set equality comparison
    # cmp("xy", "xyxyxyxy") returns 1; cmp("xy", "xyz") returns 0
    split("", a1); split("", a2)    # clear the arrays from last use
    split(s1, tmp, ""); for (i in tmp) a1[tmp[i]]
    split(s2, tmp, ""); for (i in tmp) a2[tmp[i]]
    if (length(a1) != length(a2)) return 0
    for (e in a1) if (!(e in a2)) return 0
    
    return 1
    }

cmp(set, $1)' file

Prints:

xy
yx
dawg
  • 98,345
  • 23
  • 131
  • 206