How can I extract the first two characters of a string in shell scripting?

Question

For example, given:

USCAGoleta9311734.5021-120.1287855805

I want to extract just:

US

Thanks everyone. I ended up using 'cut -c1-2', honestly I didn't even know 'cut' was there. I'd like to say I'm pretty experienced at command line - but apparently I have a lot to learn. — Greg, Sep 10 '09 at 14:39
@Greg, just be aware that cut is run as a separate process - it will be slower than the internal-bash solution I posted alongside it in my answer. That won't make any difference unless you're processing huge data sets but you need to keep it in mind. — paxdiablo, Sep 10 '09 at 14:43
[Edit] Actually, I think this line of code will probably be executed about 50,000 times per report. So I might just go with the internal Bash method - which as you said will save some much needed resources. — Greg, Sep 10 '09 at 19:02

paxdiablo · Answer 1 · 2017-06-29T01:38:44.060

Probably the most efficient method, if you're using the bash shell (and you appear to be, based on your comments), is to use the sub-string variant of parameter expansion:

pax> long="USCAGol.blah.blah.blah"
pax> short="${long:0:2}" ; echo "${short}"
US

This will set short to be the first two characters of long. If long is shorter than two characters, short will be identical to it.

This in-shell method is usually better if you're going to be doing it a lot (like 50,000 times per report as you mention) since there's no process creation overhead. All solutions which use external programs will suffer from that overhead.

If you also wanted to ensure a minimum length, you could pad it out before hand with something like:

pax> long="A"
pax> tmpstr="${long}.."
pax> short="${tmpstr:0:2}" ; echo "${short}"
A.

This would ensure that anything less than two characters in length was padded on the right with periods (or something else, just by changing the character used when creating tmpstr). It's not clear that you need this but I thought I'd put it in for completeness.

Having said that, there are any number of ways to do this with external programs (such as if you don't have bash available to you), some of which are:

short=$(echo "${long}" | cut -c1-2)
short=$(echo "${long}" | head -c2)
short=$(echo "${long}" | awk '{print substr ($0, 0, 2)}'
short=$(echo "${long}" | sed 's/^\(..\).*/\1/')

The first two (cut and head) are identical for a single-line string - they basically both just give you back the first two characters. They differ in that cut will give you the first two characters of each line and head will give you the first two characters of the entire input

The third one uses the awk sub-string function to extract the first two characters and the fourth uses sed capture groups (using () and \1) to capture the first two characters and replace the entire line with them. They're both similar to cut - they deliver the first two characters of each line in the input.

None of that matters if you are sure your input is a single line, they all have an identical effect.

I would rather use `printf '%s'` instead of `echo` in case there are weird chars in the string: https://stackoverflow.com/a/40423558/895245 For the POSIX obsessed: `head -c` is not POSIX, `cut -c` and `awk substr` are, `sed \1` not sure. — Ciro Santilli OurBigBook.com, Aug 07 '18 at 06:29
@CiroSantilli新疆改造中心996ICU六四事件 using printf, you don't even need an additional program. See [my answer](https://stackoverflow.com/a/56585879/1054423). — bschlueter, Jun 13 '19 at 18:09

score 78 · Answer 2 · edited Nov 07 '21 at 01:16

78

The easiest way is:

${string:position:length}

Where this extracts $length substring from $string at $position.

This is a Bash builtin, so awk or sed is not required.

edited Nov 07 '21 at 01:16

Peter Mortensen

30,738
21
105
131

answered Sep 10 '09 at 14:31

ennuikiller

46,381
14
112
137

1

This is the short, sweet and easiest way get the substring. – ani627 Feb 03 '16 at 17:50
I would like to try this but it's missing key details on how it's run. Please add the entire command using the OP. – John May 14 '22 at 23:37

Dennis Williamson · Answer 3 · 2022-06-28T23:18:00.683

36

You've gotten several good answers and I'd go with the Bash builtin myself, but since you asked about sed and awk and (almost) no one else offered solutions based on them, I offer you these:

echo "USCAGoleta9311734.5021-120.1287855805" | awk '{print substr($0,1,2)}'

and

echo "USCAGoleta9311734.5021-120.1287855805" | sed 's/\(^..\).*/\1/'

The awk one ought to be fairly obvious, but here's an explanation of the sed one:

substitute "s/"
the group "()" of two of any characters ".." starting at the beginning of the line "^" and followed by any character "." repeated zero or more times "*" (the backslashes are needed to escape some of the special characters)
by "/" the contents of the first (and only, in this case) group (here the backslash is a special escape referring to a matching sub-expression)
done "/"

edited Jun 28 '22 at 23:18

answered Sep 10 '09 at 15:40

Dennis Williamson

346,391
90
374
439

1

In awk strings start at index 1, so you should use `substr($0,1,2)`. – Feb 02 '20 at 20:52
Interestingly, strings are both 0-indexed and 1-indexed (tested in Gawk 5.1.0 and MacOS awk 20070501). But it's better to use 1 for consistency with `index()`. Answer updated. – Dennis Williamson Jun 28 '22 at 23:17

score 11 · Answer 4 · answered Jan 02 '17 at 18:33

11

Just grep:

echo 'abcdef' | grep -Po "^.."        # ab

answered Jan 02 '17 at 18:33

Amir Mehler

4,140
3
27
36

Fits my needs. You can remove the `-P` option to make it shorter. All regexs will understand that pattern. – datashaman Mar 27 '19 at 05:12

score 10 · Answer 5 · answered Mar 25 '18 at 22:42

If you want to use shell scripting and not rely on non-posix extensions (such as so-called bashisms), you can use techniques that do not require forking external tools such as grep, sed, cut, awk, etc., which then make your script less efficient. Maybe efficiency and posix portability is not important in your use case. But in case it is (or just as a good habit), you can use the following parameter expansion option method to extract the first two characters of a shell variable:

$ sh -c 'var=abcde; echo "${var%${var#??}}"'
ab

This uses "smallest prefix" parameter expansion to remove the first two characters (this is the ${var#??} part), then "smallest suffix" parameter expansion (the ${var% part) to remove that all-but-the-first-two-characters string from the original value.

This method was previously described in this answer to the "Shell = Check if variable begins with #" question. That answer also describes a couple similar parameter expansion methods that can be used in a slightly different context that the one that applies to the original question here.

Best answer, should be on top. no forks, no bashisms. works even with small shells such as dash. — exore, May 06 '20 at 07:58
Nice answer. It will be even better if the inner parameter expansion is quoted as `echo "${var%"${var#??}"}"`. (ref - https://www.shellcheck.net/wiki/SC2295 ) — midnite, Apr 17 '23 at 10:11

score 9 · Answer 6 · answered Sep 10 '09 at 16:35

9

If you're in bash, you can say:

bash-3.2$ var=abcd
bash-3.2$ echo ${var:0:2}
ab

This may be just what you need…

answered Sep 10 '09 at 16:35

Dominic Mitchell

11,861
4
29
30

thes easiest and most simple answer! worked like a charm – aloha Dec 07 '16 at 23:59

bschlueter · Answer 7 · 2020-05-16T00:34:17.640

8

You can use printf:

$ original='USCAGoleta9311734.5021-120.1287855805'
$ printf '%-.2s' "$original"
US

edited May 16 '20 at 00:34

answered Jun 13 '19 at 17:37

bschlueter

3,817
1
30
48

score 6 · Answer 8 · answered Sep 10 '09 at 15:44

6

colrm — remove columns from a file

To leave first two chars, just remove columns starting from 3

cat file | colrm 3

answered Sep 10 '09 at 15:44

Ian Yang

279
1
3

score 4 · Answer 9 · edited Nov 07 '21 at 01:27

4

Use:

sed 's/.//3g'

Or

awk NF=1 FPAT=..

Or

perl -pe '$_=unpack a2'

edited Nov 07 '21 at 01:27

Peter Mortensen

30,738
21
105
131

answered Apr 19 '13 at 01:27

Zombo

1
62
391
407

score 2 · Answer 10 · answered May 16 '20 at 01:23

Just for the sake of fun Ill add a few that, although they are over complicated and useless, they were not mentioned :

head -c 2 <( echo 'USCAGoleta9311734.5021-120.1287855805')

echo 'USCAGoleta9311734.5021-120.1287855805' | dd bs=2 count=1 status=none

sed -e 's/^\(.\{2\}\).*/\1/;' <( echo 'USCAGoleta9311734.5021-120.1287855805')

cut -c 1-2 <( echo 'USCAGoleta9311734.5021-120.1287855805')

python -c "print(r'USCAGoleta9311734.5021-120.1287855805'[0:2])"

ruby -e 'puts "USCAGoleta9311734.5021-120.1287855805"[0..1]'

score 1 · Answer 11 · answered Jan 23 '17 at 20:43

1

If your system is using a different shell (not bash), but your system has bash, then you can still use the inherent string manipulation of bash by invoking bash with a variable:

strEcho='echo ${str:0:2}' # '${str:2}' if you want to skip the first two characters and keep the rest
bash -c "str=\"$strFull\";$strEcho;"

answered Jan 23 '17 at 20:43

palswim

11,856
6
53
77

This uses the same method as [the main answer](http://stackoverflow.com/a/1405641/393280), only invoking `bash` if you aren't already using it. – palswim Jan 23 '17 at 20:44
Unfortunately, this comes with all of the overhead of invoking another process, but sometimes that overhead doesn't matter as much as simplicity and familiarity. – palswim Jan 23 '17 at 20:47

score 1 · Answer 12 · answered Jul 11 '21 at 10:50

How to consider Unicode + UTF-8

Let's do a quick test for those interested in Unicode characters rather than just bytes. Each character of áéíóú (acute accented vowels) is made up of two bytes in UTF-8. With:

printf 'áéíóú' | LC_CTYPE=en_US.UTF-8 awk '{print substr($0,1,3);exit}'
printf 'áéíóú' | LC_CTYPE=C awk '{print substr($0,1,3);exit}'
printf 'áéíóú' | LC_CTYPE=en_US.UTF-8 head -c3
echo
printf 'áéíóú' | LC_CTYPE=C head -c3

we get:

áéí
á
á
á

so we see that only awk + LC_CTYPE=en_US.UTF-8 considered the UTF-8 characters. The other approaches took only three bytes. We can confirm that with:

printf 'áéíóú' | LC_CTYPE=C head -c3 | hd

which gives:

00000000  c3 a1 c3                                          |...|
00000003

and the c3 by itself is trash, and does not show up on the terminal, so we saw only á.

awk + LC_CTYPE=en_US.UTF-8 actually returns 6 bytes however.

We could also have equivalently tested with:

printf '\xc3\xa1\xc3\xa9\xc3\xad\xc3\xb3\xc3\xba' | LC_CTYPE=en_US.UTF-8 awk '{print substr($0,1,3);exit}'

and if you want a general parameter:

n=3
printf 'áéíóú' | LC_CTYPE=en_US.UTF-8 awk "{print substr(\$0,1,$n);exit}"

Question more specific about Unicode + UTF-8: https://superuser.com/questions/450303/unix-tool-to-output-first-n-characters-in-an-utf-8-encoded-file

Related: https://unix.stackexchange.com/questions/3454/grabbing-the-first-x-characters-for-a-string-from-a-pipe

Tested on Ubuntu 21.04.

score 0 · Answer 13 · edited Nov 07 '21 at 01:18

0

This may be what you're after:

my $string = 'USCAGoleta9311734.5021-120.1287855805';

my $first_two_chars = substr $string, 0, 2;

Reference: substr

edited Nov 07 '21 at 01:18

Peter Mortensen

30,738
21
105
131

answered Sep 10 '09 at 14:32

draegtun

22,441
5
48
71

1

given that he/she is likely to be calling this from the shell, a better form would be `perl -e 'print substr $ARGV[0], 0, 2' 'USCAGoleta9311734.5021-120.1287855805'` – Chas. Owens Sep 10 '09 at 14:35

score -1 · Answer 14 · edited Nov 07 '21 at 01:20

-1

The code

if mystring = USCAGoleta9311734.5021-120.1287855805

    print substr(mystring,0,2)

would print US.

Where 0 is the start position and 2 is how many characters to read.

edited Nov 07 '21 at 01:20

Peter Mortensen

30,738
21
105
131

answered Sep 10 '09 at 14:33

Jambobond

619
3
12
23

Say...isn't that GW-BASIC? Oh, wait, that's `awk`. Sorry, I couldn't tell at first. – Dennis Williamson Sep 10 '09 at 15:40

score -1 · Answer 15 · answered Sep 10 '09 at 14:44

-1

perl -ple 's/^(..).*/$1/'

answered Sep 10 '09 at 14:44

dsm

10,263
1
38
72

1

You forgot to echo the string into that. – Chas. Owens Sep 10 '09 at 15:28

How can I extract the first two characters of a string in shell scripting?

15 Answers15

Linked