-1

in Bash I have an array names that contains the string values

Dr. Praveen Hishnadas
Dr. Vij Pamy
John Smitherson,Dr.,Service Account
John Dinkleberg,Dr.,Service Account

I want to capture only the names

Praveen Hishnadas
Vij Pamy
John Smitherson
John Dinkleberg

and store them back into the original array, overwriting their unsanitized versions.

I have the following snippet of code note that I'm executing the regex in Perl (-P)

for i in "${names[@]}"
do
        echo $i|grep -P  '(?:Dr\.)?\w+ \w+|$' -o | head -1

done

Which yields the output

Dr. Praveen Hishnadas
Dr. Vij Pamy
John Smitherson
John Dinkleberg

Questions:

1) Am I using the look-around command ?: incorrectly? I'm trying to optionally match "Dr." while not capturing it

2) How would I store the result of that echo back into the array names? I have tried setting it to

i=echo $i|grep -P  '(?:Dr\.)?\w+ \w+|$' -o | head -1

i=$(echo $i|grep -P  '(?:Dr\.)?\w+ \w+|$' -o | head -1)

i=`echo $i|grep -P  '(?:Dr\.)?\w+ \w+|$' -o | head -1`

but to no avail. I only started learning bash 2 days ago and I feel like my syntaxing is slightly off. Any help is appreciated.

CyberStems
  • 326
  • 2
  • 15

2 Answers2

1

Your lookahead says "include Dr. if it's there". You probably want a negative lookahead like (?!Dr\.)\w+ \w+. I'll throw in a leading \b anchor a a bonus.

names=('Dr. Praveen Hishnadas' 'Dr. Vij Pamy' 'John Smitherson,Dr.,Service Account' 'John Dinkleberg,Dr.,Service Account')

for i in "${names[@]}"
do
        grep -P  '\b(?!Dr\.)\w+ \w+' -o <<<"$i" |
        head -n 1
done

It doesn't matter for the examples you provided, but you should basically always quote your variables. See When to wrap quotes around a shell variable?

Maybe also google "falsehoods programmers believe about names".

To update your array, loop over the array indices and assign back into the array.

for((i=0;i<${#names[@]};++i)); do
    names[$i]=$(grep -P  '\b(?!Dr\.)\w+ \w+|$' -o <<<"${names[i]}" | head -n 1)
done
tripleee
  • 175,061
  • 34
  • 275
  • 318
  • Thank you for all your help this works great! Although one thing that confuses me is if I have the input "Dr Alex Jones" and modify the regex to make the period character optional eg (?!Dr\.*) it fails to recognize the pattern and outputs "Dr Alex" instead – CyberStems Oct 31 '19 at 19:10
  • 1
    Try changing it to `(?!Dr\W)`or something like that. Regex begins to crumble when you have the full range of creative human obfuscation to deal with but this should be easy still. – tripleee Oct 31 '19 at 19:26
0

How about something like this for the regex?

(?:^|\.\s)(\w+)\s+(\w+)

Regex Demo

(?:             # Non-capturing group
   ^|\.\s       # Start match if start of line or following dot+space sequence
)
(\w+)           # Group 1 captures the first name
\s+             # Match unlimited number of spaces between first and last name (take + off to match 1 space)
(\w+)           # Group 2 captures surname.
vs97
  • 5,765
  • 3
  • 28
  • 41
  • unfortunately this yields: . Praveen Hishnadas, . Vij Pamy, John Smitherson, John Dinkleberg I believe my problem may be executing it in Perl mode – CyberStems Oct 31 '19 at 17:53