1

I have code that checks the contents of a variable that is created from parsing a csv file. however the code below isn't working.

 $3 ~ ($2 == "\"[ABCDEFGUHIJKLMNOPQRSTUVWXYZ]\"" ? "^\"[[:digit:]]\"$" : "\"\"$") {
  print "15th field invalid-OFFENCE FILE"
}

sample data below

"ABC","A","","a" --- # This should fail because of no data in field 3
"ABC","","","a" --- # This should pass because of no data in field 2 thus it should use the else statement of empty
"ABC","A","2","a" --- # This should pass because of data in field 2 and 3

however whats actually happening is the second sample that should pass is failing and I cant for the life of me work out why

kvantour
  • 25,269
  • 4
  • 47
  • 72
jordanb111
  • 63
  • 7

2 Answers2

2

Change to this, see if it's working:

$3 ~ ($2 ~ /"[A-Z]"/ ? "^\"[[:digit:]]\"$" : "\"\"$") {
  print "15th field invalid-OFFENCE FILE"
}

To be more exact, you can further change to $2 ~ /^"[A-Z]"$/.

But with that print line you can't really see the differences.
You can change it to:

print "Line: " FNR "\t15th field invalid-OFFENCE FILE"

Then you will see the difference.

Update:
We misunderstood your meaing,
if fail means print, and pass to ignore, then this is what you wanted:

$3 ~ ($2 ~ /"[A-Z]"/ ? "\"\"$" : "^\"[[:digit:]]\"$") {
  print "Line: " FNR "\t15th field invalid-OFFENCE FILE"
}

Change the position of two branches will do.

Til
  • 5,150
  • 13
  • 26
  • 34
2

Looking at your script, it seems to be a bit awkward. Let's translate what you have done.

Note: I assume that you called awk with awk -F, -f file.awk inputfile

  1. $3 ~ expr: This line attempts to match field 3 to the extended regular expression represented by expr.
  2. ($2 == "\"[ABCDEFGUHIJKLMNOPQRSTUVWXYZ]\"" ? "^\"[[:digit:]]\"$" : "\"\"$"): The used expression expr in the above ERE is a ternary operation:
    • $2 == "\"[ABCDEFGUHIJKLMNOPQRSTUVWXYZ]\"": if field 2 equals the string "[ABCDEFGUHIJKLMNOPQRSTUVWXYZ]" then
    • ^\"[[:digit:]]\"$": match field 3 to the ERE ^"[[:digit:]]"$, otherwise
    • "\"\"$": otherwise match the empty string "".

The problem lays in the conditional of the ternary operator, which should be $2 ~ /"[A-Z]"/ instead of the equality operator.

$3 ~ ($2 ~ /"[A-Z]"/ ? "^\"[[:digit:]]\"$" : "\"\"$") {
  print "15th field invalid-OFFENCE FILE"
}

This might be more readable however:

($2 ~ /"[A-Z]"/ && $3 ~ /^"[[:digit:]]"$/) || 
  ($2 !~ /"[A-Z]"/ && $3 ~ /^""$/) { 
     print "15th field invalid-OFFENCE FILE"
}

$ awk -F, '($2 ~ /"[A-Z]"/ ? "^\"[[:digit:]]\"$" : "\"\"$")' file
"ABC","","","a"
"ABC","A","2","a"
kvantour
  • 25,269
  • 4
  • 47
  • 72
  • I was using the [[ABCDEFGUHIJKLMNOPQRSTUVWXYZ] to prevent accidental match of a non locale character because of the multitude of matches that can occur in UTF 8 charset. I will check to see if your answer works though :) – jordanb111 Mar 13 '19 at 10:11
  • after checking neither option seems to have worked it still following the wrong regex – jordanb111 Mar 13 '19 at 10:15
  • 2
    @jordanb111 how are you invoking your `awk` command. Do you specify `FS=","`? – kvantour Mar 13 '19 at 10:16
  • @jordanb111 I have added my output of the command to the post. I believe it is correct. – kvantour Mar 13 '19 at 10:21
  • so im parsing the the csv file with ` awk -F, ' BEGIN { FPAT = "([^,]+)|(\"[^\"]+\")" } NF!=17 { print "incorrect amount of fields-OFFFENCE FILE";}` – jordanb111 Mar 13 '19 at 10:30
  • @jordanb111 `awk` won't match other characters other than `A` to `Z`, check [this post](https://stackoverflow.com/a/44228789/5403468) for details. Oh you are already using `-F,`, that's the right way. However when you're using `FPAT`, then be careful to check each field has `"` or not. – Til Mar 13 '19 at 10:31
  • the comma separating work with my code as I have the program already successfully validating 17 other fields it just this if statement that's not working i also have other ternary statements working using the method shown in my post and they work as intended – jordanb111 Mar 13 '19 at 10:33