0

I want to validate a file which contains multiple lines in this format:

alphanumeric_word_with_spaces|alphanumeric_word_with_spaces|alphanumeric_word_with_spaces|alphanumeric_word_with_spaces|alphanumeric_word_with_spaces|alphanumeric_word_with_spaces|alphanumeric_word_with_spaces

so basically, the line is pipe delimited and I need to check whether the number of pipes are equal to a variable say 10 for now. The number of pipes cannot be greater or less than 10 . Some words maybe empty string as well, such as "||||". I just need to validate the pipe count. What's inside doesn't matter.

What can be the regex for that? I am doing this using shell scripting on linux.

Also, this is just a single line. I have multiple lines in a single file(tens of thousands of records). What would be the best way to perform the validation? I have read about sed and other things, but I am not sure which one would be faster.

Nic3500
  • 8,144
  • 10
  • 29
  • 40
  • 3
    Why use a regexp? Just count the number of `|` in the line. – Barmar Sep 29 '21 at 20:01
  • @Barmar the question you link doesn't seem to directly provide a way to efficiently check if /individual/ lines fail - without significant changes to the answers, you probably end up with rather inefficient code – jhnc Sep 29 '21 at 20:16
  • If you use the `awk` solution, it simply loops over the lines checking if they have 10 `|` characters. – Barmar Sep 29 '21 at 20:19
  • @Barmar not any of the ones I see - they all print out a count (for each line) that still needs to be checked. Seems simpler to do something like `awk -F'|' -v numpipes=10 'NF!=(numpipes+1){exit 1}' fileToValidate && echo ok || echo bad` – jhnc Sep 29 '21 at 20:28
  • awk -F'|' -v numpipes=10 'NF!=(awknumpipes+1){exit 1}' fileToValidate && echo ok || echo bad... how do i write this in shell file for .sh file to run? – Shubham Patwa Sep 29 '21 at 20:30

2 Answers2

0

Just counting pipes:

^([^|]*\|){10}[^|]*$

Enforcing values are alpha/space too:

^(?i)[a-z ]*\|){10}[a-z ]*$
Bohemian
  • 412,405
  • 93
  • 575
  • 722
0

File input.txt:

a b c|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b
a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b| 2 S
a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b|a 1 b

The script could be:

#!/bin/bash
#
inputfile="input.txt"

if [[ ! -f "$inputfile" ]]
then
    echo "The input file does not exist."
    exit 1
else
    while read -r line
    do
        echo "LINE=$line"
        pipe_count=$(echo "$line" | awk -F'|' '{print NF-1}')
        if [[ $pipe_count == 10 ]]
        then
            echo "OK, 10 |"
        else
            echo "NOT OK, only $pipe_count |"
        fi
        echo ""
    done <"$inputfile"
fi
Nic3500
  • 8,144
  • 10
  • 29
  • 40