0

I need to transform this string:

my name is user from here not there.

to:

My Name is User From Here not There

The details are, I need to upstring the first char of any word with more than 3 chars. Just it. I'm trying without success with this commands:

echo $FOO | tr '[:upper:]' '[:lower:]' | sed -e "s/\b\(.\)/\u\1/g"

Everything else should be lowercase.

tripleee
  • 175,061
  • 34
  • 275
  • 318
Sunfloro
  • 730
  • 1
  • 8
  • 12
  • 3
    While seeing your profile got to know you never select any answer as correct one, kindly give it sometime and when you have some answers try to select anyone of them as correct one too. – RavinderSingh13 Jan 17 '19 at 02:28
  • 1
    Your question title says more than 2, but the body asks about more than 3. Please [edit] to make this consistent, or clarify. – tripleee Jan 17 '19 at 05:37
  • Thank you for the feedback, my question was poorly formulated, I guess now is more clear. About selecting the correct answer RavinderSingh13, you are correct, but in my old question I was still looking for more answers because none of them could solve the problem, but I guess nobody is going to answer now. – Sunfloro Jan 18 '19 at 21:05

4 Answers4

2

Using GNU sed, (and bash):

F="my name is user from here not there."
sed -E 's/^./\u&/;s/([[:space:]])([[:alpha:]]{4})/\1\u\2/g' \ 
    <<< "${F,,}"

or:

sed -E 's/^./\u&/;s/(\s)(\w{4})/\1\u\2/g' <<< "${F,,}"

Output:

My Name is User From Here not There.

Notes:

"${F,,}" is a bash case modification parameter expansion, it returns a lower-case version of $F, which becomes the input for sed.

GNU sed offers some useful synonyms and abbreviations for common regex character classes. The character class [a-zA-Z0-9_] can be abbreviated as [[:alpha:]_], or simpler yet \w.

Even though \u looks like a regex abbreviation, it's not. It's a "special sequence" used only in substitute command replacement text -- \u means "turn the next character to uppercase".

& refers to whatever the first regexp in the substitute command matched. Compare the following:

sed 's/./&/'          <<< foo  # outputs "f"
sed 's/./&/g'         <<< foo  # outputs "foo"
sed 's/./&&&&/g'      <<< foo  # outputs "ffffoooooooo"
sed 's/./\u&&&\u&/g'  <<< foo  # outputs "FffFOooOOooO"
sed 's/.*/&&&&/'      <<< foo  # outputs "foofoofoofoo"

See the GNU sed info pages for more details.

agc
  • 7,973
  • 2
  • 29
  • 50
  • Perfect! I had to add `tr '[:upper:]' '[:lower:]'` first, than use your solution or it would be not generalist as I need. I really search in sed documentation but I could not understand how those groups work (\w{4}) \b, \u, \s, Why a slash and a number after \u for example, or &. \w means any word? is like regexp? Could you explain a little? – Sunfloro Jan 18 '19 at 20:48
  • 1
    @Otavio, See revised more general answer -- the `tr` addition should no longer be necessary. – agc Jan 19 '19 at 05:06
  • Thank you! Using what yo said in Notes, I resumed the commando to `echo $SENTENCE | sed -E 's/./\l&/g;s/^./\u&/;s/(\b)(\w{4})/\1\u\2/g'`. Without using tr. Would like to use pearl, for portability, as triplee said, but had no success. – Sunfloro Jan 19 '19 at 14:10
2

This might work for you (GNU sed):

sed -E 's/^\w+|\b\w{4,}\b/\u&/g' file

Upper-case the first character of a word if that word appears in a line that starts with a word or any word 4 or more characters long

potong
  • 55,640
  • 6
  • 51
  • 83
  • At first glance this didn't seem much different [my `sed` answer](https://stackoverflow.com/a/54229424/6136214), but the `|` and single replacement are clear improvements. A tweak -- replace `^\w+` with `^.` and it still works. – agc Jan 17 '19 at 17:28
  • Works really well. As I sayd in agc answer, I had to lowercase everything before apply your sed filter, or sentences like My NAME IS NOT sMITH, would not work. Sorry about my ignorance. I understood until the last \b, but how does \u& works? I tried a lot of different combinations but without success. – Sunfloro Jan 18 '19 at 20:55
  • 1
    @Otavio `\u&` makes the first character of the string that matched in the LHS of the substitute command, upper-case. – potong Jan 18 '19 at 22:03
1

Could you please try following.

echo "my name is user from here not there." |
awk '{for(i=1;i<=NF;i++)
    if(length($i)>3){$i=toupper(substr($i,1,1)) substr($i,2)}}
    1'

Result:

my Name is User From Here not There.
tripleee
  • 175,061
  • 34
  • 275
  • 318
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
1

tr is not really the right tool for this job; it does not know about context at all.

Some variants of sed have Perl or vi regex extensions, but this cannot really be portably solved with sed, either.

Perl to the rescue:

bash$ foo="my name is user from here not there."

bash$ echo "$foo" | perl -pe 's/\w{4,}/\u$&/g'
my Name is User From Here not There.

This does what you are actually asking, but not what you want. Perhaps add a condition to upcase the first word of the input separately ... or switch to a library like Lingua::EN::Titlecase.

Notice also how we do not use upper case for our private variables (because uppercase variables are reserved for system use) and always quote our shell strings.

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • Works fine, except for the first word as you said. Perl seems a good option to replace sed, as I want in the future to use those scripts in mac osx. Thanks for your answer! – Sunfloro Jan 18 '19 at 20:57
  • 1
    Changing the regex to `^\w+|\w{4,}` takes care of capitalizing any initial letter too if you want that. – tripleee Jan 19 '19 at 11:08