I was writing a small wrapper for nullmailer, when I noticed, imho, an unwanted behavior in grep. In particular I noticed something strange with @s.
It does break strings containing @ and will produce wrong output.
TL;DR
E-mail addresses have some rules to follow (E.G. RFC 2822), so I will use a deliberately wrong regular expression for them, just to keep things a bit shorter. Note that this will not change the problem I'm asking for.
I am using e-mail addresses in this post, but the problem is obviously for every string with at least a @ in it.
I wrote a small script to help me explain what I "found":
#!/bin/bash
funct1() {
arr=(local1@domain.tld local2@domain.tld)
regex="[[:alnum:]]*@[[:alpha:]]*\.[[:alpha:]]\{2,\}"
for dest in ${arr[@]}; do
printf "%s\n" "$dest" | grep -o -e "$regex"
done
}
funct2() {
arr=(local1@domain.tld local2@domain.tld)
regex="[[:alpha:]]*@[[:alpha:]]*\.[[:alpha:]]\{2,\}"
for dest in ${arr[@]}; do
printf "%s\n" "$dest" | grep -o -e "$regex"
done
}
funct3(){
arr=(local1@dom1@ain.tld local2@dom2@ain.tld)
regex="[[:alpha:]]*@[[:alpha:]]*@[[:alpha:]]*\.[[:alpha:]]\{2,\}"
for dest in ${arr[@]}; do
printf "%s\n" "$dest" | grep -o -e "$regex"
done
}
funct4(){
arr=(local1@dom1@ain.tld local2@dom2@ain.tld)
regex="[[:alpha:]]*@[[:alnum:]]*@[[:alpha:]]*\.[[:alpha:]]\{2,\}"
for dest in ${arr[@]}; do
printf "%s\n" "$dest" | grep -o -e "$regex"
done
}
printf "One @, all parts of regex right:\n"
funct1
printf "One @, first part of regex wrong:\n"
funct2
printf "Two @, first and second part of regex wrong:\n"
funct3
printf "Two @, first part of regex wrong:\n"
funct4
exit 0
To better understand the problem, I used two types of strings: local1@domain.tld
and local1@dom1@ain.tld
and it seems to me that grep does not behave in the correct way with strings containing at least a @.
The output is:
One @, all parts of regex right:
local1@domain.tld
local2@domain.tld
One @, first part of regex wrong:
@domain.tld
@domain.tld
Two @, first and second part of regex wrong:
Two @, first part of regex wrong:
@dom1@ain.tld
@dom2@ain.tld
funct1
has a regular expression that solves the entire strings, so no problem, all of them are printed.
funct2
has a regular expression that solves only the strings from @ to the end, so what I should expect is no output, because of the wrong expression; instead, what I have is the second part of the strings...
That is why I decided to add the second @ in the string and do some tests.
funct3
solves only the strings from the second @ to the end, so what I should expect is no output at all because of the mistake in the regex; Ok, no output.
funct4
instead has a regular expression that solves only the strings from the first @ to the end, so what I should expect in here is that he can not show me anything; instead, what I have is the output from first @, just as funct2
.
Except for funct1
I shouldn't have any output at all, I am right?
Why does grep break the result at the first @?
I consider it an unwanted behavior because this way the result will consists in strings that don't match my expression entirely.
Am I missing something?
EDIT: deleter tag undefined-behavior