How to uppercase the first char of a substring only if it has more than 3 chars length

Question

I need to transform this string:

my name is user from here not there.

to:

My Name is User From Here not There

The details are, I need to upstring the first char of any word with more than 3 chars. Just it. I'm trying without success with this commands:

echo $FOO | tr '[:upper:]' '[:lower:]' | sed -e "s/\b\(.\)/\u\1/g"

Everything else should be lowercase.

While seeing your profile got to know you never select any answer as correct one, kindly give it sometime and when you have some answers try to select anyone of them as correct one too. — RavinderSingh13, Jan 17 '19 at 02:28
Your question title says more than 2, but the body asks about more than 3. Please [edit] to make this consistent, or clarify. — tripleee, Jan 17 '19 at 05:37
Thank you for the feedback, my question was poorly formulated, I guess now is more clear. About selecting the correct answer RavinderSingh13, you are correct, but in my old question I was still looking for more answers because none of them could solve the problem, but I guess nobody is going to answer now. — Sunfloro, Jan 18 '19 at 21:05

agc · Answer 1 · 2019-01-19T17:02:48.330

Using GNU sed, (and bash):

F="my name is user from here not there."
sed -E 's/^./\u&/;s/([[:space:]])([[:alpha:]]{4})/\1\u\2/g' \ 
    <<< "${F,,}"

or:

sed -E 's/^./\u&/;s/(\s)(\w{4})/\1\u\2/g' <<< "${F,,}"

Output:

My Name is User From Here not There.

Notes:

"${F,,}" is a bash case modification parameter expansion, it returns a lower-case version of $F, which becomes the input for sed.

GNU sed offers some useful synonyms and abbreviations for common regex character classes. The character class [a-zA-Z0-9_] can be abbreviated as [[:alpha:]_], or simpler yet \w.

Even though \u looks like a regex abbreviation, it's not. It's a "special sequence" used only in substitute command replacement text -- \u means "turn the next character to uppercase".

& refers to whatever the first regexp in the substitute command matched. Compare the following:

sed 's/./&/'          <<< foo  # outputs "f"
sed 's/./&/g'         <<< foo  # outputs "foo"
sed 's/./&&&&/g'      <<< foo  # outputs "ffffoooooooo"
sed 's/./\u&&&\u&/g'  <<< foo  # outputs "FffFOooOOooO"
sed 's/.*/&&&&/'      <<< foo  # outputs "foofoofoofoo"

See the GNU sed info pages for more details.

Perfect! I had to add `tr '[:upper:]' '[:lower:]'` first, than use your solution or it would be not generalist as I need. I really search in sed documentation but I could not understand how those groups work (\w{4}) \b, \u, \s, Why a slash and a number after \u for example, or &. \w means any word? is like regexp? Could you explain a little? — Sunfloro, Jan 18 '19 at 20:48
@Otavio, See revised more general answer -- the `tr` addition should no longer be necessary. — agc, Jan 19 '19 at 05:06
Thank you! Using what yo said in Notes, I resumed the commando to `echo $SENTENCE | sed -E 's/./\l&/g;s/^./\u&/;s/(\b)(\w{4})/\1\u\2/g'`. Without using tr. Would like to use pearl, for portability, as triplee said, but had no success. — Sunfloro, Jan 19 '19 at 14:10

score 2 · Accepted Answer · answered Jan 17 '19 at 11:34

2

This might work for you (GNU sed):

sed -E 's/^\w+|\b\w{4,}\b/\u&/g' file

Upper-case the first character of a word if that word appears in a line that starts with a word or any word 4 or more characters long

answered Jan 17 '19 at 11:34

potong

55,640
6
51
83

At first glance this didn't seem much different [my `sed` answer](https://stackoverflow.com/a/54229424/6136214), but the `|` and single replacement are clear improvements. A tweak -- replace `^\w+` with `^.` and it still works. – agc Jan 17 '19 at 17:28
Works really well. As I sayd in agc answer, I had to lowercase everything before apply your sed filter, or sentences like My NAME IS NOT sMITH, would not work. Sorry about my ignorance. I understood until the last \b, but how does \u& works? I tried a lot of different combinations but without success. – Sunfloro Jan 18 '19 at 20:55
1

@Otavio `\u&` makes the first character of the string that matched in the LHS of the substitute command, upper-case. – potong Jan 18 '19 at 22:03

score 1 · Answer 3 · edited Jan 17 '19 at 05:35

1

Could you please try following.

echo "my name is user from here not there." |
awk '{for(i=1;i<=NF;i++)
    if(length($i)>3){$i=toupper(substr($i,1,1)) substr($i,2)}}
    1'

Result:

my Name is User From Here not There.

edited Jan 17 '19 at 05:35

tripleee

175,061
34
275
318

answered Jan 17 '19 at 02:20

RavinderSingh13

130,504
14
57
93

Thank you @tripleee for editing it, I forgot to add one-liner form of solution. – RavinderSingh13 Jan 17 '19 at 05:40
1

Thank you! Works fine except for the first word that should have the first char in uppercase. – Sunfloro Jan 18 '19 at 20:59

score 1 · Answer 4 · answered Jan 17 '19 at 05:33

tr is not really the right tool for this job; it does not know about context at all.

Some variants of sed have Perl or vi regex extensions, but this cannot really be portably solved with sed, either.

Perl to the rescue:

bash$ foo="my name is user from here not there."

bash$ echo "$foo" | perl -pe 's/\w{4,}/\u$&/g'
my Name is User From Here not There.

This does what you are actually asking, but not what you want. Perhaps add a condition to upcase the first word of the input separately ... or switch to a library like Lingua::EN::Titlecase.

Notice also how we do not use upper case for our private variables (because uppercase variables are reserved for system use) and always quote our shell strings.

Works fine, except for the first word as you said. Perl seems a good option to replace sed, as I want in the future to use those scripts in mac osx. Thanks for your answer! — Sunfloro, Jan 18 '19 at 20:57
Changing the regex to `^\w+|\w{4,}` takes care of capitalizing any initial letter too if you want that. — tripleee, Jan 19 '19 at 11:08

How to uppercase the first char of a substring only if it has more than 3 chars length

4 Answers4