0

Would like a generalized utility for creating regexp patterns on demand, based on command line parameter.

The following script is only intended to demonstrate the method of splitting and conversion. It is not the final product of how it will be used. NOTE: Yes, I did intend to use [[alphanum]] in , not [[alpha]].

My problem is that the output of the script (included below) for the command

scriptname abc

is

[Aa][BCbc]

when I want it to be

[Aa][Bb][Cc]

I am looking for the correct sed syntax directing the command repeating the substitution for each character individually, not only the first one.

Script:

#!/bin/bash

makePatternMatch()
{
    echo "${charList}" | awk 'BEGIN{
        regExp="" ;
    }
    {
        if( $0 != "" ){
            for( i=1 ; i <= NF ; i++ ){
                regExp=sprintf("%s[%s%s]", regExp, toupper($i), tolower($i) ) ;
            } ;
        } ;
    }END{
        printf("%s\n", regExp ) ;
    }'
}

explodeString()
{
    charList=$(echo "${pattern}" | sed 's+[[alphanum]]*+&\ +g' )
}


for pattern in $@
do
    explodeString
    makePatternMatch
done
Eric Marceau
  • 1,601
  • 1
  • 8
  • 11
  • For my version of sed, **sed (GNU sed) 4.7**, the ":" chars are not accepted. It must be used without. – Eric Marceau Feb 27 '23 at 04:02
  • Weird. the `[:alpha:]` etc have been around for 20+ yrs and in all implementations I've used the `:` is required. I just check my `sed` is v 4.8. If it's not working with `:`s I think there is some other problem. ..... Also, I said do it in awk, and doah! I see you are doing it in awk. I have to reread you Q as I think I was rushing too fast. Good luck. – shellter Feb 27 '23 at 04:08
  • 3
    Here's my version. `echo "abc1" | awk '{n=split($0,arr,""); for (i=1;i<=(n);i++){ if (arr[i] ~ /[[:alpha:]]/) {printf "[" toupper(arr[i]) tolower(arr[i])"]"} else printf arr[i] }}END{printf"\n"}'` ... output `[Aa][Bb][Cc]1` You can wrap it in other scripts, shell functions/and/or `awk` functions. Good luck. – shellter Feb 27 '23 at 04:13
  • @shellter, thank you. I will use that. But I would still want to know how to use sed to simply turn a word into a character list. – Eric Marceau Feb 27 '23 at 04:15
  • 2
    `echo "abc1" | sed 's/[[:alpha:]]/[\U&\L&]/g'` ... output=`[Aa][Bb][Cc]1` (I lucked out!) (With my version of `sed`). If my small sample works on your cmd-line, then something about your `charList=$(....)` is not right. I don't see you using `$charList` anywhere, but you said at the top the overall code wasn't complete, so if still not working then add an [mcve] and maybe we can figure it ont. Good luck. – shellter Feb 27 '23 at 04:23
  • 2
    it is `alnum` not `alphanum`, for ex: `echo 'abcd123' | sed 's/[[:alnum:]]/{&}/g'` – Sundeep Feb 27 '23 at 04:27
  • to make upper and lowe case characters from a string, you don't care what happens to the numbers or punctuation, so you don't want to process it. (Actually, how is mine working?!) .Back later. – shellter Feb 27 '23 at 04:29
  • 1
    Ah yes, `[[:alpha:]]` is like the single char specifer `.` (dot). A char class only matches ONE character and relies on `*` (and possible other syntax) to amplify its "range". But the global substitution from `/g` at the end operates on each of the single characters that DO match the pattern, but leaves others untouched. So that is how the `1` is printed in the output and not deleted. – shellter Feb 27 '23 at 04:57
  • Note what happens when we use `[[:alnum:]]`, .... `echo "abc1" | sed 's/[[:alnum:]]/[\U&\L&]/g'` .... `output=[Aa][Bb][Cc][11]`. The `1` gets processed as a match and is wrapped to make a char class. (This is OK, but it will drive a maintainer nuts when they notice it ... "How did that get there!?" . Good luck to all. – shellter Feb 27 '23 at 05:00
  • 1
    Numbers don't have case so `[[:alnum:]]` makes no sense; use `[[:alpha:]]` – tripleee Feb 27 '23 at 08:43
  • Thank you, one and all! You've each provided useful input that has led me to the sed-based solution I was looking for. – Eric Marceau Feb 28 '23 at 19:48
  • @triplee, you are quite correct about [[:alnum:]] not making sense for my purposes. – Eric Marceau Feb 28 '23 at 20:42

2 Answers2

2

This might work for you (GNU sed):

sed -E 's/([[:alpha:]])|[[:digit:]]/[\u\1\l&]/g' file

If a character is either alpha or digit replace it by [x] where x is either a digit or an uppercase and lowercase alpha.

Alternative:

sed 's/[[:alnum:]]/\u&\l&/g' file

However this doubles every digit (no effect).

potong
  • 55,640
  • 6
  • 51
  • 83
0

Combining input from everyone who replied, I was able to formulate the -based one-liner command that I was looking for. Namely:

for pattern in "ab*c1" "d2e?f"
do  echo "${pattern}" |
    sed 's/[[:alpha:]]/[\U&\L&]/g'
done

giving me

[Aa][Bb]*[Cc]1
[Dd]2[Ee]?[Ff]

But getting back to script in my OP, the modified logic would have been:

#!/bin/bash

makePatternMatch()
{
    echo "${charList}" | awk 'BEGIN{
        regExp="" ;
    }
    {
        if( $0 != "" ){
            for( i=1 ; i <= NF ; i++ ){
                if( $i ~ /[[:alpha:]]/ ){
                    regExp=sprintf("%s[%s%s]", regExp, toupper($i), tolower($i) ) ;
                }else{
                    regExp=sprintf("%s%s", regExp, $i ) ;
                } ;
            } ;
        } ;
    }END{
        printf("%s\n", regExp ) ;
    }'
}

explodeString()
{
    charList=$(echo "${pattern}" | sed 's+.+&\ +g' )
}


for pattern in "$@"
do
    explodeString
    makePatternMatch
done

Thank again to all who provided various examples and insights!

Eric Marceau
  • 1,601
  • 1
  • 8
  • 11
  • You basically always want double quotes around `"$@"`, otherwise you break on arguments with quoted spaces etc. See also [When to wrap quotes around a shell variable?](https://stackoverflow.com/questions/10067266/when-to-wrap-quotes-around-a-shell-variable) – tripleee Mar 01 '23 at 20:05
  • @triplee, thank you for that insight. Obviously, critical to know that. :-) – Eric Marceau Mar 01 '23 at 20:11