-1

Input string:

1234 5678 9101 1234
2999 5178 9101 2234
9999 5628 9201 1232
8888 3678 9101 1232

The input string above has spaces after 1st, 2nd and 3rd line. Each line after the last digit has spaces and then the new line starts except the last line.

The last line ends at the last character(digit '2') and has nothing else after that.

Required Match: I want to match only the first three block of digits in each line(match should not include the single space between blocks).

Expected Output using sed:

**** **** **** 1234 **** **** **** 2234 **** **** **** 1232 **** **** **** 1232

My approach: I use negative lookbehind(I know that sed does not support lookaround assertions) \d{4}(?! {2,}) that matches, in first three lines, only the first three block of digits but in the fourth line matches all the blocks of digits(obviously as the last line does not have 2 spaces after the last digit.)

Fiddle: https://regex101.com/r/VzQf3D/2

HarshvardhanSharma
  • 754
  • 2
  • 14
  • 28

6 Answers6

2

With Perl, I would say:

perl -pe 's/(\d{4})(?= [^ ])/****/g' file
tshiono
  • 21,248
  • 2
  • 14
  • 22
1

If I understand well, you can try

sed ':A;s/\(.*\)\([^ |\*]\)\([ |\*]*[ ][^ ][^ ]*[ ]*$\)/\1*\3/;tA' infile
ctac_
  • 2,413
  • 2
  • 7
  • 17
1

With GNU sed:

sed -E 'h;s/^(([^ ]+ ){3})//;x;s/[^ ]*$//;s/[0-9]/*/g;G;s/\n//' file

Output:

**** **** **** 1234
**** **** **** 2234
**** **** **** 1232
**** **** **** 1232

See: man sed

Cyrus
  • 84,225
  • 14
  • 89
  • 153
0

I am unsure about bash, but for normal regex I would use

^(?: *)(\d{4})(?: +)(\d{4})(?: +)(\d{4})  # with multiline flag

Explanation:

^ is line start 
(?: *) is a non capturing group of any number of spaces
(\d{4}) is a capturing group of 4 digit
(?: +) is a non capturing group of one or more number of spaces
(\d{4}) is a capturing group of 4 digit
(?: +) is a non capturing group of one or more number of spaces
(\d{4}) is a capturing group of 4 digit

Fiddle: https://regexr.com/3ike0


If you use sed for this regex, non-capturing groups are not a possibility according to

how do you specify non-capturing groups in sed?

answer by https://stackoverflow.com/a/36546377/7505395 as well as others provided for this question. Sorry.

Patrick Artner
  • 50,409
  • 9
  • 43
  • 69
  • Your solution works good but it matches even single spaces between the blocks. My apologies, I later edited the question to include it. – HarshvardhanSharma Dec 30 '17 at 12:29
  • @HarshvardhanSharma - changed it to ignore the spaces - see fiddle – Patrick Artner Dec 30 '17 at 12:36
  • @HarshvardhanSharma and updated the explanation. You find each number without spaces in seperate groups of the regex match per line – Patrick Artner Dec 30 '17 at 12:41
  • Your solution captures the spaces too. It should not capture the spaces, only the blocks of digits(4 digits in each block) should be matched. – HarshvardhanSharma Dec 30 '17 at 12:52
  • @HarshvardhanSharma it does NOT capture spaces. For regex there is a "Match" and "Groups" - the "Groups" are where the captured things land - the "Match" is the whole **including** Non-Capuring Groups - by using the values from the "Groups" you get the real captured things - without spaces. Not sure about bash - what bash command are you using with this RegEx pattern? – Patrick Artner Dec 30 '17 at 13:04
  • Currently I am not using any bash command. I test it on regex101.com, there it (non capturing group `(?: +)`) matches the spaces too(highlighted in blue, whereas spaces _should not be highlighted at all_). I understand your explanation of "Match" and "Group". Fiddle: https://regex101.com/r/VzQf3D/3 – HarshvardhanSharma Dec 30 '17 at 13:13
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/162220/discussion-between-harshvardhansharma-and-patrick-artner). – HarshvardhanSharma Dec 30 '17 at 13:15
0

Since you haven't shown us expected output so putting this solution as per your explanation only. I believe you need first 3 columns in each line of your Input_file if yes then following may help you in same. If your requirement is different then kindly do show us the expected output with few more details in code tags in your post.

awk '{print $1,$2,$3}'  Input_file

Output will be as follows.

1234 5678 9101
2999 5178 9101
9999 5628 9201
8888 3678 9101

EDIT: Seen your edited post, in case you don't need space between 3 columns in output then following may help you in same.

awk '{print $1 $2 $3}' Input_file
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
0

What about

^(?:(?:^| +)[0-9]{4})(?=[0-9]{4} $)
iBug
  • 35,554
  • 7
  • 89
  • 134