5

I got a string which looks like this:

"abcderwer 123123 10,200 asdfasdf iopjjop"

Now I want to extract numbers, following the scheme xx,xxx where x is a number between 0-9. E.g. 10,200. Has to be five digit, and has to contain ",".

How can I do that?

Thank you

user1360250
  • 331
  • 1
  • 3
  • 14

6 Answers6

11

You can use grep:

$ echo "abcderwer 123123 10,200 asdfasdf iopjjop" | egrep -o '[0-9]{2},[0-9]{3}'
10,200
codaddict
  • 445,704
  • 82
  • 492
  • 529
5

In pure Bash:

pattern='([[:digit:]]{2},[[:digit:]]{3})'
[[ $string =~ $pattern ]]
echo "${BASH_REMATCH[1]}"
Dennis Williamson
  • 346,391
  • 90
  • 374
  • 439
1

Check out pattern matching and regular expressions.

Links:

Bash regular expressions

Patterns and pattern matching

SO question

and as mentioned above, one way to utilize pattern matching is with grep. Other uses: echo supports patterns (globbing) and find supports regular expressions.

Community
  • 1
  • 1
keyser
  • 18,829
  • 16
  • 59
  • 101
1

Simple pattern matching (glob patterns) is built into the shell. Assuming you have the strings in $* (that is, they are command-line arguments to your script, or you have used set on a string you have obtained otherwise), try this:

for token; do
  case $token in
    [0-9][0-9],[0-9][0-9][0-9] ) echo "$token" ;;
  esac
done
tripleee
  • 175,061
  • 34
  • 275
  • 318
  • 1
    That would be `$@`. There's a difference. – Dennis Williamson May 14 '12 at 13:24
  • When you access the command line, there certainly is a difference between `$*` and `$@`; but I specifically chose to call the variable which contains the `argv` array by its original Bourne name because that's the name many beginner-level expositions use; and in this context, when you want the values split into whitespace-separated tokens, that's what a script would use. I agree that if you have to refer to the positional parameters directly in a script, `"$@"` is almost always what you want. – tripleee May 14 '12 at 13:32
  • With, for example, `set -- 'abc def' ghi`, `for arg in "$@"; do echo "$arg"; done` does the right thing. These rarely do the right thing: `$*` or `"$*"` (rarely = only when you want to flatten the arguments). There's no point in continuing to repeat beginner-level expositions which are wrong. – Dennis Williamson May 14 '12 at 13:40
  • `for token; do` does the right thing here, regardless of how you got the things in there; what's to argue about? – tripleee May 14 '12 at 13:58
0

The following example using your input data string should solve the problem using sed.

$ echo abcderwer 123123 10,200 asdfasdf iopjjop | sed -ne 's/^.*\([0-9,]\{6\}\).*$/\1/p'
10,200
octopusgrabbus
  • 10,555
  • 15
  • 68
  • 131
bos
  • 6,437
  • 3
  • 30
  • 46
0

A slightly non-typical solution:

< input tr -cd [0-9,\ ] | tr \  '\012' | grep '^..,...$' 

(The first tr removes everything except commas, spaces, and digits. The second tr replaces spaces with newlines, putting each "number" on a separate line, and the grep discards everything except those that match your criterion.)

William Pursell
  • 204,365
  • 48
  • 270
  • 300