How to extract numbers from a string?

Question

I have string contains a path

string="toto.titi.12.tata.2.abc.def"

I want to extract only the numbers from this string.

To extract the first number:

tmp="${string#toto.titi.*.}"
num1="${tmp%.tata*}"

To extract the second number:

tmp="${string#toto.titi.*.tata.*.}"
num2="${tmp%.abc.def}"

So to extract a parameter I have to do it in 2 steps. How to extract a number with one step?

This question has been sitting around for a while now. If none of the answers provide what you're looking for, then could you update your question to clarify your requirements a little more? — ghoti, Jun 03 '16 at 16:58
`echo ${string} | grep -o -E "[0-9]+"` i think is the most concise and easiest to understand (most everyone knows grep). from: https://stackoverflow.com/a/52947167/52074 — Trevor Boyd Smith, Oct 28 '20 at 15:51

score 30 · Answer 1 · answered Jul 26 '13 at 15:02

30

You can use tr to delete all of the non-digit characters, like so:

echo toto.titi.12.tata.2.abc.def | tr -d -c 0-9

answered Jul 26 '13 at 15:02

mti2935

11,465
3
29
33

3

The output of this appears to mash all the numbers together, making `122` in your example. How might they be separated? – ghoti Jun 03 '16 at 17:03
in order to set it into variable use- PARAM=\`echo toto.titi.12.tata.2.abc.def | tr -d -c 0-9 \` – Adir Dayan Jul 14 '20 at 15:03

score 15 · Answer 2 · edited Jan 06 '17 at 06:07

To extract all the individual numbers and print one number word per line pipe through -

tr '\n' ' ' | sed -e 's/[^0-9]/ /g' -e 's/^ *//g' -e 's/ *$//g' | tr -s ' ' | sed 's/ /\n/g'

Breakdown:

Replaces all line breaks with spaces: tr '\n' ' '
Replaces all non numbers with spaces: sed -e 's/[^0-9]/ /g'
Remove leading white space: -e 's/^ *//g'
Remove trailing white space: -e 's/ *$//g'
Squeeze spaces in sequence to 1 space: tr -s ' '
Replace remaining space separators with line break: sed 's/ /\n/g'

Example:

echo -e " this 20 is 2sen\nten324ce 2 sort of" | tr '\n' ' ' | sed -e 's/[^0-9]/ /g' -e 's/^ *//g' -e 's/ *$//g' | tr -s ' ' | sed 's/ /\n/g'

Will print out

score 12 · Answer 3 · answered Oct 23 '18 at 10:49

12

Here is a short one:

string="toto.titi.12.tata.2.abc.def"
id=$(echo "$string" | grep -o -E '[0-9]+')

echo $id // => output: 12 2

with space between the numbers. Hope it helps...

answered Oct 23 '18 at 10:49

Adi Azarya

4,015
3
18
26

ghoti · Answer 4 · 2015-03-10T15:49:18.150

9

Parameter expansion would seem to be the order of the day.

$ string="toto.titi.12.tata.2.abc.def"
$ read num1 num2 <<<${string//[^0-9]/ }
$ echo "$num1 / $num2"
12 / 2

This of course depends on the format of $string. But at least for the example you've provided, it seems to work.

This may be superior to anubhava's awk solution which requires a subshell. I also like chepner's solution, but regular expressions are "heavier" than parameter expansion (though obviously way more precise). (Note that in the expression above, [^0-9] may look like a regex atom, but it is not.)

You can read about this form or Parameter Expansion in the bash man page. Note that ${string//this/that} (as well as the <<<) is a bashism, and is not compatible with traditional Bourne or posix shells.

edited Mar 10 '15 at 15:49

answered Mar 10 '15 at 15:38

ghoti

45,319
8
65
104

2

What exactly do you mean that it depends on the format of `$string`? I can't think of any example that would break it. – PesaThe Aug 09 '18 at 11:18
1

Heh, this is an old question. :) The only thing I can think of at this point is that if there are additional numbers, say `aa12aa34aa56`, and you only read two variables, the trailing numbers get added to the last variable, separated by spaces. If this was a concern, then a better solution might be to read the string into an array: `read -a nums <<<"${string//[^0-9]/ }"`. – ghoti Aug 09 '18 at 14:28

score 4 · Answer 5 · answered Sep 06 '22 at 05:59

4

Convert your string to an array like this:

$ str="toto.titi.12.tata.2.abc.def"
$ arr=( ${str//[!0-9]/ } )
$ echo "${arr[@]}"
12 2

answered Sep 06 '22 at 05:59

Ivan

6,188
1
16
23

score 3 · Answer 6 · answered Jul 26 '13 at 15:00

3

This would be easier to answer if you provided exactly the output you're looking to get. If you mean you want to get just the digits out of the string, and remove everything else, you can do this:

d@AirBox:~$ string="toto.titi.12.tata.2.abc.def"
d@AirBox:~$ echo "${string//[a-z,.]/}"
122

If you clarify a bit I may be able to help more.

answered Jul 26 '13 at 15:00

drldcsta

413
3
8

I updated my question. I want to extraxt the 12 and then extract 2. not extract both numbers at the same time – MOHAMED Jul 26 '13 at 15:45

score 2 · Answer 7 · edited Jan 06 '17 at 06:09

2

You can also use sed:

echo "toto.titi.12.tata.2.abc.def" | sed 's/[0-9]*//g'

Here, sed replaces

any digits (class [0-9])
repeated any number of times (*)
with nothing (nothing between the second and third /),
and g stands for globally.

Output will be:

toto.titi..tata..abc.def

edited Jan 06 '17 at 06:09

Benjamin W.

46,058
19
106
116

answered Jul 29 '13 at 13:46

jderefinko

647
4
6

7

I think OP wants the digits, not the string as output. – cchamberlain May 27 '15 at 17:48
2

If you want the digits, use a `^` to invert the match: `echo "toto.titi.12.tata.2.abc.def" | sed 's/[^0-9]*//g'`. – Dario Seidl Sep 15 '19 at 20:36

score 1 · Answer 8 · answered Jul 26 '13 at 19:07

Use regular expression matching:

string="toto.titi.12.tata.2.abc.def"
[[ $string =~ toto\.titi\.([0-9]+)\.tata\.([0-9]+)\. ]]
# BASH_REMATCH[0] would be "toto.titi.12.tata.2.", the entire match
# Successive elements of the array correspond to the parenthesized
# subexpressions, in left-to-right order. (If there are nested parentheses,
# they are numbered in depth-first order.)
first_number=${BASH_REMATCH[1]}
second_number=${BASH_REMATCH[2]}

score 1 · Answer 9 · answered Jul 26 '13 at 19:21

1

Using awk:

arr=( $(echo $string | awk -F "." '{print $3, $5}') )
num1=${arr[0]}
num2=${arr[1]}

answered Jul 26 '13 at 19:21

anubhava

761,203
64
569
643

score 1 · Answer 10 · answered Jul 30 '17 at 06:36

1

Hi adding yet another way to do this using 'cut',

echo $string | cut -d'.' -f3,5 | tr '.' ' '

This gives you the following output: 12 2

answered Jul 30 '17 at 06:36

Vivek-Ananth

494
4
4

score 0 · Answer 11 · edited Mar 01 '20 at 00:06

0

Fixing newline issue (for mac terminal):

cat temp.txt | tr '\n' ' ' | sed -e 's/[^0-9]/ /g' -e 's/^ *//g' -e 's/ *$//g' | tr -s ' ' | sed $'s/ /\\\n/g'

edited Mar 01 '20 at 00:06

Obsidian

3,719
8
17
30

answered Feb 29 '20 at 22:44

placidnick

1

score 0 · Answer 12 · answered Sep 05 '22 at 23:35

Assumptions:

there is no embedded white space
the string of text always has 7 period-delimited strings
the string always contains numbers in the 3rd and 5th period-delimited positions

One bash idea that does not require spawning any subprocesses:

$ string="toto.titi.12.tata.2.abc.def"

$ IFS=. read -r x1 x2 num1 x3 num2 rest <<< "${string}"
$ typeset -p num1 num2
declare -- num1="12"
declare -- num2="2"

In a comment OP has stated they wish to extract only one number at a time; the same approach can still be used, eg:

$ string="toto.titi.12.tata.2.abc.def"

$ IFS=. read -r x1 x2 num1 rest <<< "${string}"
$ typeset -p num1
declare -- num1="12"

$ IFS=. read -r x1 x2 x3 x4 num2 rest <<< "${string}"
$ typeset -p num2
declare -- num2="2"

A variation on anubhava's answer that uses parameter expansion instead of a subprocess call to awk, and still working with the same set of initial assumptions:

$ arr=( ${string//./ } )
$ num1=${arr[2]}
$ num2=${arr[4]}
$ typeset -p num1 num2
declare -- num1="12"
declare -- num2="2"

How to extract numbers from a string?

12 Answers12

Linked

Related