-1

I'm trying to print lines from a file if, and only if, every character in a line meets a certain regex condition.

The problem is that any line that contains any character that meets the regex condition evaluates to true and gets printed, even if it also contains characters outside that range.

I'd prefer to use awk as I already have additional conditions in place that I would like the evaluated line to meet, and would prefer the solution to implement basic regex so I can apply different matching conditions on future files (whereas the grep solution shown here focuses on non-ASCII identification and seems to require --perl-regexp compatibility -- my focus is on meeting a given regex condition across an entire given line).

In the example, uppercase letters fall outside the regex condition and therefore the whole line where they appear should be ignored.

file.txt:
abc123
123abc
123ABC
AbCdEf

When I try...

awk '$0 ~ /[a-z]/ || $0 ~ /[0-9]/' < file.txt

...every line is printed, since the regex condition is met at least once in each line:

abc123
123abc
123ABC
AbCdEf

What I want is to not print a line if any character outside the [a-z] and/or [0-9]range is present, so the desired output here would be:

abc123
123abc

The closest hits I could find when researching this are here and here, but I don't want to search-and-replace anything on the line, I just want to ignore the line and move on to the next one if any unwanted characters are present.

personguy
  • 9
  • 3

1 Answers1

3

For the given sample input/output, try these:

$ awk '!/[^a-z0-9]/' ip.txt
abc123
123abc

$ grep -v '[^a-z0-9]' ip.txt
abc123
123abc
  • [^set] means match any character except s or e or t - in other words, ^ at the beginning of the class inverts the characters to be matched
  • ! and -v are used to print lines that do not match the given condition

The above solutions will match empty lines as well. To avoid that, you can use:

awk '/^[a-z0-9]+$/' ip.txt
grep -xE '[a-z0-9]+' ip.txt
Sundeep
  • 23,246
  • 2
  • 28
  • 103
  • 1
    That works perfectly, thank you! For my real-world example (only lines that contain chars from [space] to ~) I can use `awk '!/[^ -~]/'` Could you characterize what job the `!` and `^` are doing that accomplish the goal? I thought `^` means "match any character except" .. so does the `!` turn it into a double-negative (inversion)? – personguy Dec 11 '22 at 08:21
  • Yeah, I think you got that right. `/[^a-z0-9]/` will match a line if it contains other than lowercase alphabets or digits, for example `A` or `;` or space, etc. But such lines should not be part of the output, so `!` and `-v` are used to invert the condition. – Sundeep Dec 11 '22 at 08:39