Length of string in bash

Question

How do you get the length of a string stored in a variable and assign that to another variable?

myvar="some string"
echo ${#myvar}  
# 11

How do you set another variable to the output 11?

fedorqui · Answer 1 · 2017-01-30T07:52:38.013

646

To get the length of a string stored in a variable, say:

myvar="some string"
size=${#myvar}

To confirm it was properly saved, echo it:

$ echo "$size"
11

edited Jan 30 '17 at 07:52

answered Jun 28 '13 at 15:15

fedorqui

275,237
103
548
598

16

With UTF-8 stings, you could have a string length **and** a bytes length. [see my answer](http://stackoverflow.com/a/31009961/1765658) – F. Hauri - Give Up GitHub Jun 23 '15 at 17:59
1

You can also use it directly in other parameter expansions - for example in this test I check that `$rulename` starts with the `$RULE_PREFIX` prefix: `[ "${rulename:0:${#RULE_PREFIX}}" == "$RULE_PREFIX" ]` – Thomas Guyot-Sionnest Jul 21 '15 at 14:13
2

Could you please explain a bit the expressions of `#myvar` and `{#myvar}`? – Lerner Zhang Sep 19 '16 at 06:03
2

@lerneradams see [Bash reference manual →3.5.3 Shell Parameter Expansion](https://www.gnu.org/software/bash/manual/bashref.html#Shell-Parameter-Expansion) on `${#parameter}`: _The length in characters of the expanded value of parameter is substituted_. – fedorqui Oct 21 '16 at 14:31

F. Hauri - Give Up GitHub · Accepted Answer · 2023-02-13T15:07:20.777

370

Edit 2023-02-13: Use of `printf %n` instead of locales...

UTF-8 string length

In addition to fedorqui's correct answer, I would like to show the difference between string length and byte length:

myvar='Généralités'
chrlen=${#myvar}
oLang=$LANG oLcAll=$LC_ALL
LANG=C LC_ALL=C
bytlen=${#myvar}
LANG=$oLang LC_ALL=$oLcAll
printf "%s is %d char len, but %d bytes len.\n" "${myvar}" $chrlen $bytlen

will render:

Généralités is 11 char len, but 14 bytes len.

you could even have a look at stored chars:

myvar='Généralités'
chrlen=${#myvar}
oLang=$LANG oLcAll=$LC_ALL
LANG=C LC_ALL=C
bytlen=${#myvar}
printf -v myreal "%q" "$myvar"
LANG=$oLang LC_ALL=$oLcAll
printf "%s has %d chars, %d bytes: (%s).\n" "${myvar}" $chrlen $bytlen "$myreal"

will answer:

Généralités has 11 chars, 14 bytes: ($'G\303\251n\303\251ralit\303\251s').

Nota: According to Isabell Cowan's comment, I've added setting to $LC_ALL along with $LANG.

Same, but without having to play with locales

I recently learn %n format of printf command (builtin):

myvar='Généralités'
chrlen=${#myvar}
printf -v _ %s%n "$myvar" bytlen
printf "%s is %d char len, but %d bytes len.\n" "${myvar}" $chrlen $bytlen
Généralités is 11 char len, but 14 bytes len.

Syntax is a little counter-intuitive, but this is very efficient! (further function strU8DiffLen is about 2 time quicker by using printf than previous version using local LANG=C.)

Length of an argument, working sample

Argument work same as regular variables

showStrLen() {
    local -i chrlen=${#1} bytlen
    printf -v _ %s%n "$1" bytlen
    LANG=$oLang LC_ALL=$oLcAll
    printf "String '%s' is %d bytes, but %d chars len: %q.\n" "$1" $bytlen $chrlen "$1"
}

will work as

showStrLen théorème
String 'théorème' is 10 bytes, but 8 chars len: $'th\303\251or\303\250me'

Useful `printf` correction tool:

If you:

for string in Généralités Language Théorème Février  "Left: ←" "Yin Yang ☯";do
    printf " - %-14s is %2d char length\n" "'$string'"  ${#string}
done

 - 'Généralités' is 11 char length
 - 'Language'     is  8 char length
 - 'Théorème'   is  8 char length
 - 'Février'     is  7 char length
 - 'Left: ←'    is  7 char length
 - 'Yin Yang ☯' is 10 char length

Not really pretty output!

For this, here is a little function:

strU8DiffLen() {
    local -i bytlen
    printf -v _ %s%n "$1" bytlen
    return $(( bytlen - ${#1} ))
}

or written in one line:

strU8DiffLen() { local -i _bl;printf -v _ %s%n "$1" _bl;return $((_bl-${#1}));}

Then now:

for string in Généralités Language Théorème Février  "Left: ←" "Yin Yang ☯";do
    strU8DiffLen "$string"
    printf " - %-$((14+$?))s is %2d chars length, but uses %2d bytes\n" \
        "'$string'" ${#string} $((${#string}+$?))
  done 

 - 'Généralités'  is 11 chars length, but uses 14 bytes
 - 'Language'     is  8 chars length, but uses  8 bytes
 - 'Théorème'     is  8 chars length, but uses 10 bytes
 - 'Février'      is  7 chars length, but uses  8 bytes
 - 'Left: ←'      is  7 chars length, but uses  9 bytes
 - 'Yin Yang ☯'   is 10 chars length, but uses 12 bytes

Unfortunely, this is not perfect!

But there left some strange UTF-8 behaviour, like double-spaced chars, zero spaced chars, reverse deplacement and other that could not be as simple...

Have a look at diffU8test.sh or diffU8test.sh.txt for more limitations.

edited Feb 13 '23 at 15:07

answered Jun 23 '15 at 17:50

F. Hauri - Give Up GitHub

64,122
17
116
137

I appreciate this answer, as file systems impose name limitations in bytes and not characters. – Gid Nov 14 '16 at 18:33
2

You may also need to set LC_ALL=C and perhaps others. – Isabell Cowan Dec 29 '16 at 01:49
@IsabellCowan In wich case? I think no! You could prefer to use `LC_ALL` but if not used, this is not *needed*. But no other variable have to be used. – F. Hauri - Give Up GitHub Dec 29 '16 at 07:22
@F.Hauri try this code: /usr/bin/env -i LC_ALL=en_US.utf8 LANG=C bash -c 'v=€; echo ${#v}' LC_ALL might be unset by default on your system, but it is not on mine. – Isabell Cowan Dec 30 '16 at 20:18
@IsabellCowan Yes, see `man 7 locale`, `LC_ALL` have precedence over all others. It's the reason I follow *Debian* rules, having `LC_ALL=` somewhere and change `LANG` only, by default (It could be very usefull to be able to just change `LC_CTIME` or `LC_NUMERIC`).. – F. Hauri - Give Up GitHub Dec 31 '16 at 00:23
1

@F.Hauri But, it none the less follows that on some systems your solution will not work, because it leaves LC_ALL alone. It might work fine on default installs of Debian and it's derivatives, but on others (like Arch Linux) it will fail to give the correct byte length of the string. – Isabell Cowan Jan 03 '17 at 18:49
it didn't work for me and i couldn't find out why, i successed using `iconv` like this: `STR=$(printf "$1" | iconv -f UTF-8 -t ISO-8859-15)`, and then `${#STR}` worked well – Cinn Apr 20 '17 at 11:52
@F.Hauri `GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)` I don't have the admin rights on the server, i tried the examples you gave and i always got the byte length. I'm trying this from a `.sh` file encoding in UTF-8.. – Cinn Apr 20 '17 at 13:45
2

thanks for taking something simple and convoluting it :) – thistleknot Nov 06 '18 at 16:45
Just to note that UTF8 is a variable width encoding from 1 to 6 bytes cf. other encodings i.e. UTF16 which is a fixed width 2 byte per character. – masseyb Nov 06 '18 at 18:22
A UTF8 encoded Oracle DB instance allows nvarchar2(4000) data types (4000 bytes, each character stored on 1 to 6 bytes) whereas a UTF16 encoded instance only allows for nvarchar2(2000) data types (4000 bytes, 2 bytes per character). Ex. UTF8 string truncation depends on the number of bytes required to store the data which is not necessarily (and most often not the case when dealing with internationalised software) equal to the number of characters. – masseyb Nov 06 '18 at 18:37
@masseyb Yes `☯` and `←` will require 3 bytes, where `é` and `ô` require 2 bytes and `a` or `z` only 1 byte... – F. Hauri - Give Up GitHub Nov 06 '18 at 20:39
3

@thistleknot I'm sorry, [對不起](https://translate.google.ch/#auto/fr/%E5%B0%8D%E4%B8%8D%E8%B5%B7) Sometime ***simple*** is just an idea. – F. Hauri - Give Up GitHub Nov 06 '18 at 20:43
@F.Hauri correct for UTF8. Encoded in UTF16 each character ("☯", "←", "é", "ô", "a" and "z") is encoded with a fixed 2 bytes. If assuming that all text is ASCII then any mention of UTF8 is "good to know" but not necessary for say as it's 8-bit ASCII and the code points are identical in UTF8. Having taken the time to delve into encodings then it's worth while imho to note that the byte count is encoding dependent and there exists a plethora of different encodings. – masseyb Nov 06 '18 at 22:05
@thistleknot previous chinese post warn me about another problem. see posted test script about limitation (bug) of this: [diffU8test.sh.txt](https://f-hauri.ch/vrac/diffU8test.sh.txt) or [diffU8test.sh](https://f-hauri.ch/vrac/diffU8test.sh) – F. Hauri - Give Up GitHub Apr 03 '19 at 14:52
You can't necessarily guarantee that the default locale is UTF-8. To make sure you get character length rather than byte length, you may want to set `LC_ALL=C.UTF-8` and `LANG=C.UTF-8`. – alexia Aug 21 '20 at 15:18
@nyuszika7h You're right, anyway, mostly my `strU8DiffLen` will return correct difference. In case current session usr Latin encoding, `strU8DiffLen` will return `0` (alway) wich will be correct too. – F. Hauri - Give Up GitHub Aug 21 '20 at 20:44
It's worth to mention that the function `strU8DiffLen` will fail if `$(( bytlen - ${#1} ))` is greater than **255**. Why not just `printf` the result and call the function inside a `sub-shell`? Related: https://www.gnu.org/software/bash/manual/html_node/Exit-Status.html – Artfaith Mar 07 '21 at 19:27
1

@F8ER In order to prevent ***forks***. For sample: Trying to replace `return` by `echo`, adding `OFF=$(strU8DiffLen....)` and replacing `?` by `OFF` in last sample take 10ms in my host, where published proposition do the jobs in 1ms. (10x faster!) – F. Hauri - Give Up GitHub Mar 07 '21 at 19:38
@F8ER If you mind using `return`, you could replace them by `printf -v ${2:-OFF} %d $(( bytlen - ${#1} ))`, then use `$OFF` or any other variable by specifying his name as second argument. – F. Hauri - Give Up GitHub Mar 07 '21 at 19:43

dmatej · Answer 3 · 2017-10-11T19:14:31.037

44

I wanted the simplest case, finally this is a result:

echo -n 'Tell me the length of this sentence.' | wc -m;
36

edited Oct 11 '17 at 19:14

answered Oct 11 '17 at 08:52

dmatej

1,518
15
24

7

sorry mate :( This is bash... the cursed hammer that sees everything as a nail, particularly your thumb. 'Tell me the length of this sentence.' contains 36 characters. `echo '' | wc -m` => `1`. You'd need to use `-n`: `echo -n '' | wc -m` => `0`... in which case it's a good solution :) – AJP Oct 11 '17 at 15:06
2

Thanks for the correction! Manual page says: `-n do not output the trailing newline` – dmatej Oct 11 '17 at 19:11

score 27 · Answer 4 · edited Dec 26 '16 at 21:23

27

You can use:

MYSTRING="abc123"
MYLENGTH=$(printf "%s" "$MYSTRING" | wc -c)

wc -c or wc --bytes for byte counts = Unicode characters are counted with 2, 3 or more bytes.
wc -m or wc --chars for character counts = Unicode characters are counted single until they use more bytes.

edited Dec 26 '16 at 21:23

admirabilis

2,290
2
18
33

answered May 09 '15 at 03:27

atesin

303
3
2

4

-c is for bytes. -m is for chars. https://www.gnu.org/software/coreutils/manual/html_node/wc-invocation.html http://pubs.opengroup.org/onlinepubs/009604499/utilities/wc.html – LLFourn Jul 22 '16 at 03:31
3

Seriously? a pipe, a subshell and an external command for something that trivial? – gniourf_gniourf Dec 26 '16 at 21:26
this handles something like `mylen=$(printf "%s" "$HOME/.ssh" | wc -c)` whereas the accepted solution fails and you need to `myvar=$HOME/.ssh` first. – JL Peyret Feb 13 '20 at 21:44
This isn’t any better than `${#var}`. You still need `LC_ALL` / `LANG` set to an UTF-8 locale, otherwise `-m` will return byte count. – alexia Aug 19 '20 at 11:16

score 23 · Answer 5 · edited Dec 26 '17 at 23:07

In response to the post starting:

If you want to use this with command line or function arguments...

with the code:

size=${#1}

There might be the case where you just want to check for a zero length argument and have no need to store a variable. I believe you can use this sort of syntax:

if [ -z "$1" ]; then
    #zero length argument 
else
    #non-zero length
fi

See GNU and wooledge for a more complete list of Bash conditional expressions.

score 19 · Answer 6 · edited Aug 19 '16 at 00:13

19

If you want to use this with command line or function arguments, make sure you use size=${#1} instead of size=${#$1}. The second one may be more instinctual but is incorrect syntax.

edited Aug 19 '16 at 00:13

Zane

4,652
1
29
26

answered Jun 05 '14 at 20:11

Dick Guertin

747
8
9

14

Part of the problem with "you can't do " is that, that syntax being invalid, it's unclear what a reader should interpret it to mean. `size=${#1}` is certainly valid. – Charles Duffy Jun 05 '14 at 20:18
Well, that's unexpected. I didn't know that #1 was a substitute for $1 in this case. – Dick Guertin Jun 07 '14 at 00:08
16

It isn't. `#` isn't replacing the `$` -- the `$` outside the braces is still the expansion operator. The `#` is the length operator, as always. – Charles Duffy Jun 07 '14 at 01:25
I've fixed this answer since it is a useful tip but not an exception to the rule - it follows the rule exactly, as pointed out by @CharlesDuffy – Zane Aug 19 '16 at 20:51

thistleknot · Answer 7 · 2018-11-06T21:37:15.637

18

Using your example provided

#KISS (Keep it simple stupid)
size=${#myvar}
echo $size

edited Nov 06 '18 at 21:37

answered Nov 06 '18 at 16:46

thistleknot

1,098
16
38

@Angel The question was about setting a variable to the output of the length command, and this question answers that. – Astitva Srivastava Sep 29 '21 at 07:30

score 15 · Answer 8 · answered Oct 05 '17 at 18:20

Here is couple of ways to calculate length of variable :

echo ${#VAR}
echo -n $VAR | wc -m
echo -n $VAR | wc -c
printf $VAR | wc -m
expr length $VAR
expr $VAR : '.*'

and to set the result in another variable just assign above command with back quote into another variable as following:

otherVar=`echo -n $VAR | wc -m`   
echo $otherVar

http://techopsbook.blogspot.in/2017/09/how-to-find-length-of-string-variable.html

Troublemaker-DV · Answer 9 · 2021-08-27T04:58:26.720

I know that the Q and A's are old enough, but today I faced this task for first time. Usually I used the ${#var} combination, but it fails with unicode: most text I process with the bash is in Cyrillic... Based on @atesin's answer, I made short (and ready to be more shortened) function which may be usable for scripting. That was a task which led me to this question: to show some message of variable length in pseudo-graphics box. So, here it is:

$ cat draw_border.sh
#!/bin/sh
#based on https://stackoverflow.com/questions/17368067/length-of-string-in-bash
border()
{
local BPAR="$1"
local BPLEN=`echo $BPAR|wc -m`
local OUTLINE=\|\ "$1"\ \|
# line below based on https://www.cyberciti.biz/faq/repeat-a-character-in-bash-script-under-linux-unix/
# comment of Bit Twiddler Jun 5, 2021 @ 8:47
local OUTBORDER=\+`head -c $(($BPLEN+1))</dev/zero|tr '\0' '-'`\+
echo $OUTBORDER
echo $OUTLINE
echo $OUTBORDER
}
border "Généralités"
border 'А вот еще одна '$LESSCLOSE' '
border "pure ENGLISH"

And what this sample produces:

$ draw_border.sh
+-------------+
| Généralités |
+-------------+
+----------------------------------+
| А вот еще одна /usr/bin/lesspipe |
+----------------------------------+
+--------------+
| pure ENGLISH |
+--------------+

First example (in French?) was taken from someone's example above. Second one combines Cyrillic and the value of some variable. Third one is self-explaining: only 1s 1/2 of ASCII chars.

I used echo $BPAR|wc -m instead of printf ... in order to not rely on if the printf is buillt-in or not.

Above I saw talks about trailing newline and -n parameter for echo. I did not used it, thus I add only one to the $BPLEN. Should I use -n, I must add 2.

To explain the difference between wc -m and wc -c, see the same script with only one minor change: -m was replaced with -c

$ draw_border.sh
+----------------+
| Généralités |
+----------------+
+---------------------------------------------+
| А вот еще одна /usr/bin/lesspipe |
+---------------------------------------------+
+--------------+
| pure ENGLISH |
+--------------+

Accented characters in Latin, and most of characters in Cyrillic are two-byte, thus the length of drawn horizontals are greater than the real length of the message. Hope, it will save some one some time :-)

p.s. Russian text says "here is one more"

p.p.s. Working "two-liner"

#!/bin/sh
#based on https://stackoverflow.com/questions/17368067/length-of-string-in-bash
border()
{
# line below based on https://www.cyberciti.biz/faq/repeat-a-character-in-bash-script-under-linux-unix/
# comment of Bit Twiddler Jun 5, 2021 @ 8:47
local OUTBORDER=\+`head -c $(( $(echo "$1"|wc -m) +1))</dev/zero|tr '\0' '-'`\+
echo $OUTBORDER"\n"\|\ "$1"\ \|"\n"$OUTBORDER
}
border "Généralités"
border 'А вот еще одна '$LESSCLOSE' '
border "pure ENGLISH"

In order to not clutter the code with repetitive OUTBORDER's drawing, I put the forming of OUTBORDER into separate command

score 1 · Answer 10 · edited Aug 11 '22 at 19:54

1

Maybe just use wc -c to count the number of characters:

myvar="Hello, I am a string."
echo -n $myvar | wc -c

Result:

edited Aug 11 '22 at 19:54

SRG

498
7
19

answered Mar 29 '22 at 11:01

刘千强

11
1

bl3ssedc0de · Answer 11 · 2022-08-21T00:43:25.863

0

Length of string in bash

str="Welcome to Stackoveflow"  
length=`expr length "$str"`  
  
echo "Length of '$str' is $length"

OUTPUT

Length of 'Welcome to Stackoveflow' is 23

edited Aug 21 '22 at 00:43

answered Aug 20 '22 at 07:46

bl3ssedc0de

780
1
11
15

Length of string in bash

11 Answers11

Edit 2023-02-13: Use of `printf %n` instead of locales...

UTF-8 string length

Same, but without having to play with locales

Length of an argument, working sample

Useful `printf` correction tool:

Unfortunely, this is not perfect!

Linked

Related

Length of string in bash

11 Answers11

Edit 2023-02-13: Use of printf %n instead of locales...

UTF-8 string length

Same, but without having to play with locales

Length of an argument, working sample

Useful printf correction tool:

Unfortunely, this is not perfect!

Linked

Related

Edit 2023-02-13: Use of `printf %n` instead of locales...

Useful `printf` correction tool: