2

I am trying to solve one task given to me by using sed only. The task is:

Given lines of credit card numbers, mask the first digits of each credit card number with an asterisk (i.e., *) and print the masked card number on a new line. Each credit card number consists of four space-separated groups of four digits. For example, the credit card number 1234 5678 9101 1234 would be masked and printed as **** **** **** 1234.

I have successfully used the following command. It is working as expected and printing the desired output.

sed 's/\([0-9]\{4\}\s\)\{3\}\([0-9]\{4\}\)/**** **** **** \2/'

However, I was trying another solution with \b and it is not working. I am not able to understand why it is not working. \b should match the beginning and the space between the words. I know it can be solved with \s but I want to understand what's wrong with the solution with \b only.

sed 's/\(\b[0-9]\{4\}\b\)\{3\}\([0-9]\{4\}\)/**** **** **** \2/'

NOTE: Since I have a working solution for it. I just want to understand why my solution using \b is not working.

abhiarora
  • 9,743
  • 5
  • 32
  • 57
  • 1
    What version of sed are you using? Different versions recognize different regular expressions. – glenn jackman Nov 21 '19 at 17:54
  • 2
    You might consider: `sed -E 's/[[:digit:]]{4}([^[:digit:]])/****\1/g'` – glenn jackman Nov 21 '19 at 17:55
  • `sed (GNU sed) 4.7`. I just want to understand why \b version doesn't work. – abhiarora Nov 21 '19 at 18:06
  • 1
    I am not sure why my question has been downvoted! – abhiarora Nov 21 '19 at 18:22
  • 3
    For some reason my answer was also downvoted. I have upvoted question to neutralize an unnecessary downvote. – anubhava Nov 21 '19 at 18:24
  • After reading the documents, i think my question isn't duplicate because the problem with my expression was I couldn't understand how repetition works in regular expression and how word boundary behaves. So, the issue is how an expression behave when repetition and word boundary are used together. Thanks anyone! – abhiarora Nov 21 '19 at 18:47

1 Answers1

2

\b does work in gnu sed but your 2nd regex is incorrect.

You should be using:

sed 's/\b\([0-9]\{4\}\s\)\{3\}\([0-9]\{4\}\)/**** **** **** \2/' file

or with -E

sed -E 's/\b([0-9]{4}\s){3}([0-9]{4})/**** **** **** \2/' file

Note that second \b should be replaced with \s (whitespace) since your inout text has spaces between numbers.

Here is a good article on Word Boundaries

anubhava
  • 761,203
  • 64
  • 569
  • 643
  • What's wrong with it? I know I can do it with `\s` as I have already solved the problem but not sure why `\b` doesn't work inside regex sub-expression? – abhiarora Nov 21 '19 at 18:12
  • 1
    `\b` does work as you can see in my regex as well. But `\b` is a word boundary and it cannot match a space. You input has `1234 5678 9101 1234` so you need to match space between numbers as well. – anubhava Nov 21 '19 at 18:14
  • 1
    So even `sed 's/\(\b[0-9]\{4\}\b\s\)\{3\}\([0-9]\{4\}\)/**** **** **** \2/'` will work for you but it will be a bit inefficient because of redundant `\b` – anubhava Nov 21 '19 at 18:15
  • 1
    Thanks for the answer. The problem is `\b` only matches the position not the character. Your last comment helped me understand that. Thanks – abhiarora Nov 21 '19 at 18:22
  • [Here is good read on word boundaries](https://www.regular-expressions.info/wordboundaries.html) – anubhava Nov 21 '19 at 18:26
  • 1
    Thanks. It can clear all of my questions. You can add that link to your answers as well. – abhiarora Nov 21 '19 at 18:30