0

I've been trying to get a regular expression for a SINGLE digit I need to extract from a file. Let's assume the file has numbers: 100, 10, 20, 35, 67, 8. I only want 8. I tried

    egrep "[0-9]{1}"

but it still returns all the numbers in the file. If i do

    egrep "[0-9]{3}

it only returns 100. Why does it work this way?

Allan
  • 12,117
  • 3
  • 27
  • 51
Duck Dodgers
  • 223
  • 3
  • 14
  • 2
    Because `[0-9](1)` will match a single digit, and `100` is three single digits (a one, a zero and a zero), all of which meet the regex's specification. `[0-9]{3}` needs three digits in a row, so it won't match `1' (one digit), or `10` (twod digits), but matches `100` (which is three digits). – Ken White Jun 14 '18 at 02:56
  • 1
    Thanks for this @KenWhite – Duck Dodgers Jun 14 '18 at 02:59

4 Answers4

4

Imagine you have the following 2 input files with numbers on the same line or on different lines as shown hereunder:

INPUT:

more digits*
::::::::::::::
digits2.in
::::::::::::::
100
10
20
35
67
8
::::::::::::::
digits.in
::::::::::::::
100,10,20,35,67,8

You can run the following grep command to fetch only the single digit (this work for both files):

$ grep -o '\b[0-9]\b' digits.in                                                                                                  
8
$ grep -o '\b[0-9]\b' digits2.in                                                                                                 
8

Explanations:

The regex \b[0-9]\b will match a single digit surrounded by word boundary character, the -o option is used to print only that result and not the whole line as the default behavior does.

In case there are several numbers composed of a single digit:

INPUT2:

more digits*
::::::::::::::
digits2.in
::::::::::::::
100
10
20
35
67
8
9
::::::::::::::
digits.in
::::::::::::::
100,10,20,35,67,8,9

OUTPUT:

$ grep -o '\b[0-9]\b' digits2.in 
8
9

$ grep -o '\b[0-9]\b' digits.in                                                                                                  
8
9

This will output all the numbers composed of a single digit.

Allan
  • 12,117
  • 3
  • 27
  • 51
1

If the numbers are separated by commas, try this:

    grep ",\d,"

(\d is the same as [0-9])

What that's saying is "match a comma, followed by a digit, followed by another comma". Since we just want numbers that are one digit, we need to have a start and end of the number, which can we classified by the commas.

Another option is:

    grep "\b\d\b"

What that's saying is "start searching at the beginning of a word, followed by a digit, and then the end of a word". A word is classified by [A-Za-z0-9]. If you want to look into \b more, it's called a word boundary.

mcohn
  • 46
  • 4
  • 2
    Note that neither `grep` nor `grep -E` accepts `"\d"` for digits; you either have to write it as `"[0-9]"` or switch to the `grep -P` regex engine. – WoodrowShigeru Mar 27 '20 at 09:49
0

With [0-9]{1} youre asking to match every digit because you actually do not define boundaries to your regex. If grep allows look behinds and aheads, you could use the following regex

(?<!\d)\d(?!\d)
Yassin Hajaj
  • 21,337
  • 9
  • 51
  • 89
-1
    grep "^[0-9]$"

solves the problem. The key was the missing $ at the end. This indicates it should be a single digit

Duck Dodgers
  • 223
  • 3
  • 14