0

Good evening,

I have text file and I would like to keep only the first digit in each line or in other words: remove all digits except the first digit. To make it easy I have this file

$ cat file
one1
2two3
45end6

And I expect this output:

one1
2two
4end

Could someone point me in the right direction.

clbx
  • 15
  • 9
StackRob
  • 35
  • 7

4 Answers4

3

The simplest way to do it is to tell sed to delete the 2nd occurrence of the numeric digit. Do it in several passes and you are guaranteed all the digits except the first occurrence will be deleted.

sed -i ':a;s/[0-9]//2;ta' file

:a defines a function labeled a.

s/[0-9]//2 deletes the 2nd occurrence of a digit.

ta branch to function : labeled a.

Sample output:

one1
2two
4end
alvits
  • 6,550
  • 1
  • 28
  • 28
0

I would argue that this is totally the wrong direction, but you could do something like:

sed -E -e 'h; s/([0-9]).*/\1/g; x; s/[^0-9]*[0-9]//; s/[0-9]*//g; H; g; s/\n//;' input
William Pursell
  • 204,365
  • 48
  • 270
  • 300
  • I haven’t tested this so it isn’t an answer, but wouldn’t just `sed -r 's/([0-9].*)[0-9]/\1/g'` work? – Daniel H Jul 10 '17 at 20:59
  • @DanielH That works for this input, but fails on a line like `1foo2bar3baz`, in which the `2` is retained. – William Pursell Jul 10 '17 at 21:02
  • But that does point to a nice perl solution: `perl -pe '1 while s/([0-9].*)[0-9]/\1/g'` – William Pursell Jul 10 '17 at 21:04
  • @DanielH and William Pursell, a simple `loop` inside `sed` will work on the sample input and on `1foo2bar3baz` like `sed -i ':a;s/[0-9]//2;ta' file`. – alvits Jul 10 '17 at 22:30
  • @WilliamPursell I wasn’t sure how `s///g` worked in a case like this, which is why I wasn’t sure. I’m surprised it works for `45end6` but not `1foo2bar3baz`; can you direct me to a resource that explains it better than `sed`s man page? – Daniel H Jul 11 '17 at 14:12
  • 1
    @DanielH The trouble is that the `g` flag won't consider overlapping matches, but the loop does. There is some good info at https://stackoverflow.com/questions/4736/learning-regular-expressions and https://stackoverflow.com/questions/476714/is-there-a-good-online-interactive-regex-tutorial – William Pursell Jul 11 '17 at 17:13
  • I understand regular expressions well enough; my issue is with the details of `sed`. Neither the man page nor the info page for GNU `sed` actually specified that the `g` flag only considers non-overlapping matches; I eventually found it in [the POSIX standard](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html). My suggestion (or even what I intended to suggest, where the `.*` was lazy) wouldn’t have worked on the third line of the sample input for the same reason it wouldn’t have worked on `1foo2bar3baz`. – Daniel H Jul 11 '17 at 17:43
0

I think you do not have a \r in your file (or delete that character first), so you can use that character to mark your digits.

sed -r 's/([0-9])/\r\1/g; s/\r//; s/\r[0-9]//' inputfile

First you mark all digits, than remove the marker before the first one and finally remove all digits that still have a marker.

EDIT: Replaced s/\r([0-9])/\1/; with s/\r//;

Walter A
  • 19,067
  • 2
  • 23
  • 43
0

You can actually use expr in POSIX shell do what you need as well, e.g.

while read -r line; do 
    len=$(expr match "$line" [0-9][0-9]*[A-Za-z]*)
    [ "$len" -gt '0' ] && expr substr "$line" 1 $len || 
    printf "%s\n" "$line"
done < file

With your data in file, just cut and paste the above into the command line, e.g.

$ while read -r line; do
>     len=$(expr match "$line" [0-9][0-9]*[A-Za-z]*)
>     [ "$len" -gt '0' ] && expr substr "$line" 1 $len ||
>     printf "%s\n" "$line"
> done < file
one1
2two
45end

note: while this is a solution using expr match and expr substr, the sed solutions are more efficient as you will spawn a separate subshell on each expr call. (but it is good to know the alternatives...)

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85