1

I want to remove 1 with awk from this regex: ^1[0-9]{10}$ if said regex is found in any field. I've been trying to make it work with sub or substr for a few hours now, I am unable to find the correct logic for this. I already have the solution for sed: s/^1\([0-9]\{10\}\)$/\1/, I need to make this work with awk.

Edit for input and output example. Input:

10987654321
2310987654321
1098765432123    

(awk twisted and overcomplicated syntax)

Output:

0987654321
2310987654321
1098765432123    

Basically the leading 1 needs to be removed only when it's followed by ten digits. The 2nd and 3rd example lines are correct, 2nd has 23 in front of 1, 3rd has a leading 1 but it's followed by 12 digits instead of ten. That's what the regex specifies.

one-liner
  • 791
  • 1
  • 9
  • 19
  • 2
    Its much better you do post some data, and what you like to do with it. This will else only be guessing. – Jotne Aug 26 '14 at 12:45
  • but you can use the `match` function and then use the values set in RSTART and RLENGTH. See http://www.grymoire.com/Unix/Awk.html#uh-47 Good luck! – shellter Aug 26 '14 at 12:46
  • I edited my question in order to include specific examples of what I want awk to do, even if the regex is self explanatory and I also provided the sed alternative. – one-liner Aug 26 '14 at 13:46

2 Answers2

1

if gnu awk is available for you, you could use gensub function:

echo '10987654321'|awk '{s=gensub(/^1([0-9]{10})$/,"\\1","g");print s}'
0987654321

edit:

do it for every field:

awk '{for(i=1;i<=NF;i++)$i=gensub(/^1([0-9]{10})$/,"\\1","g", $i)}7 file

test:

kent$  echo '10987654321 10987654321'|awk '{for(i=1;i<=NF;i++)$i=gensub(/^1([0-9]{10})$/,"\\1","g", $i)}7'                                                                  
0987654321 0987654321
Kent
  • 189,393
  • 32
  • 233
  • 301
  • It works but not on multiple fields, I tried `echo '10987654321 10987654321'`. Is there no way of doing this with `sub`/`gsub`? `Substr` also did not work at all. – one-liner Aug 26 '14 at 13:37
  • This is the reason I wanted to use awk in the first place, to perform the substitution on each field. By default awk's field separator is one or more spaces. – one-liner Aug 26 '14 at 14:07
  • @linux_newbie I see what you meant, you need loop the fields: `awk '{for(i=1;i<=NF;i++)$i=gensub(/^1([0-9]{10})$/,"\\1","g", $i)}7' file` – Kent Aug 26 '14 at 14:08
  • Thank you for the help. I chose Steve's solution since I found it more straightforward but I am upping your solution as well. – one-liner Aug 27 '14 at 11:51
1

With sub(), you could try:

awk '/^1[0-9]{10}$/ { sub(/^1/, "") }1' file

Or with substr():

awk '/^1[0-9]{10}$/ { $0 = substr($0, 2) }1' file

If you need to test each field, try looping over them:

awk '{ for(i=1; i<=NF; i++) if ($i ~ /^1[0-9]{10}$/) sub(/^1/, "", $i) }1' file

https://www.gnu.org/software/gawk/manual/html_node/String-Functions.html

Steve
  • 51,466
  • 13
  • 89
  • 103
  • Thanks, this works: `echo '10987654321,10987654321' | awk -F ',' '{for(i=1; i<=NF; i++) if ($i ~ /^1[0-9]{10}$/) sub(/^1/, "", $i) }1'`...but I want to add another sub before the initial one: `sub(/ |-|+|\(|\)/,"")` which basically preformats the strings by removing spaces,+,-,(,) in order to be matched by the regex and also remove the leading one. I tried but I keep getting syntax error. – one-liner Aug 26 '14 at 14:45
  • Haven't tried it yet but I don't think it will work for the intended input. Consider this string being piped to awk: `+1 0(987)654-321`. This needs to be preformatted to 10987654321 so that the `^1[0-9]{10}$` regex will match and then awk will proceed with removing the leading `1` with the second sub. The `if` condition will not be met if I don't do the preliminary formatting sub first, correct? – one-liner Aug 26 '14 at 16:02
  • @linux_newbie: Is there any reason why you cannot throw a `gsub(/[ () +-]*/, "")` in front of the loop? That would be the simplest solution IMO. If you want to apply that to a subset of fields, just move it inside the loop and set a target. For example: `awk -F, '{ for(i=1;i<=NF;i++) { gsub(/[ () +-]*/, "", $i); if ($i ~ /^1[0-9]{10}$/) { sub(/^1/, "", $i) } } }1' OFS=, file` – Steve Aug 26 '14 at 23:08
  • This does exactly what I want: `awk -F ',' '{gsub(/[ () +-]*/, ""); for(i=1; i<=NF; i++) if ($i ~ /^1[0-9]{10}$/) sub(/^1/, "", $i) }1'`. Can you please explain the sub syntax from the first example you gave: `{ sub(/^1/, "") }1`. What exactly does the 1 outside the curly braces do? And can you explain the final solution loop syntax? I always had trouble understanding loop syntaxes in awk. In any case, thanks a lot for the help! – one-liner Aug 27 '14 at 11:49
  • 1
    @linux_newbie: No worries. The `1` on the end forces the command to return true. By default, AWK will print the record (which, by default, is a single line) when the expression evaluates to true. Of course, you don't necessarily need to use `1` (you could use any non-zero integer), but the use of `1` to return true is best practice. The long equivalent would be: `awk 'BEGIN { FS="," } { gsub(/[() +-]*/, ""); for (i=1; i<=NF; i++) { if ($i ~ /^1[0-9]{10}$/) { sub(/^1/, "", $i) } } print }' file`. The placement of the braces is critical. – Steve Aug 27 '14 at 12:52
  • @linux_newbie: The `for` loop used is your typical C-style loop, which in this case will loop from one to the number of fields in the row, `NF`. `$i` is therefore the actual field value, and `i` is its field position. Another common type of loop you will see regularly in `AWK` code is one that loops over the indices of an array. For example, `for (i in a) { print i, a[i] }` will print the key (`i`) followed by the key's value (`a[i]`). HTH. – Steve Aug 27 '14 at 12:57
  • Thanks for taking the time to answer my questions. The one thing that still bothers me because I don't understand the logic: why the gsub with `() +-` is being applied to all fields while the regex sub needs a loop to achieve this? – one-liner Aug 27 '14 at 17:16
  • Guess I spoke too soon... OFS is not printed out if the input is '453452,34545' (less than 11 digits). This is the syntax used: `awk -F ',' -v OFS='|' '{gsub(/[ ()+-/,""); for(i=1; i<=NF; i++) if ($i~/^1[0-9]{10}$/) sub(/^1/,"",$i)}1'`. The OFS is still `,`. Furthermore, if I expand the initial gsub character class, there is no more separator in the output: `awk -F ',' -v OFS='|' '{gsub(/[ ()+-\/\\\[\]\|]/,""); for(i=1; i<=NF; i++) if ($i~/^1[0-9]{10}$/) sub(/^1/,"",$i)}1'`. This is increasingly frustrating, I am wasting hours for a single syntax that's supposed to do a very simple thing. – one-liner Aug 27 '14 at 21:21
  • @linux_newbie: WRT#1: I was under the impression that you wanted to strip these characters from each line. Doing so makes it easy to then test to see if the number starts with `1` and is followed by ten digits. WRT#2: If no changes are made to a line, AWK will print the line without setting the new `OFS`. This is a good thing, because it makes AWK run fast. If you want AWK to force a change to the line's field separator, the AWKish way is to say let `$1=$1`. Try: `awk -F, -v OFS='|' '{ gsub(/[() +-]*/, ""); for (i=1;i<=NF;i++) { if ($i ~ /^1[0-9]{10}$/) { sub(/^1/, "", $i) } } $1=$1 }1' file` – Steve Aug 27 '14 at 23:04
  • @linux_newbie: WRT#3: Remember, if you're really stuck with substitutions, you can often use multiple calls `gsub()`. Yes, it's less efficient but it will get the job done and save some frustration. I believe the problem you're having with the regex is because you're trying to escape some characters. A better way to write that character class would be: `gsub(/[][() /|\+-]*/, "")`. – Steve Aug 27 '14 at 23:20
  • @linux_newbie: WRT#4: Only you know what your _actual_ input is and only you know what the expected output ought to be. From what I can tell, your actual input is a table of strangely formatted numbers, some of which look like phone numbers. There may be extra rows or columns in there, but I really don't know for sure. You've asked a question, but it wasn't the question you really wanted to ask. If you are still having difficulty, please [edit](https://stackoverflow.com/posts/25506106/edit) your question with some actual input and expected output. Include as many edge cases as possible. HTH. – Steve Aug 27 '14 at 23:33
  • 1
    Steve, thanks a lot for the feedback. I did try with `$1=$1` but like this: `; $1=$1; print`. This is why it didn't work probably. I did try to specify the characters in the class without escaping them but strange things happened, that is why I escaped them. I believe you fully answered my question given the information I provided. I actually found a fully working solution on my own using just sed and it only took me a few minutes. But your solution still provides valuable insight and I might come back to it should my needs demand it. I highly appreciate your help and feedback on this. – one-liner Aug 28 '14 at 04:03