Given a single line of input with 'n' arguments which are space delimited. The input arguments themselves are variable. The input is given through an external file.
I want to move specific elements to variables depending on regular expressions. As such, I was thinking of declaring a pointer variable first to keep track of where on the line I am. In addition, the assignment to variable is independent of numerical order, and depending on input some variables may be skipped entirely.
My current method is to use
awk '{print $1}' file.txt
However, not all elements are fixed and I need to account for elements that may be absent, or may have multiple entries.
UPDATE: I found another method.
file=$(cat /file.txt)
for i in ${file[@]}; do
echo $i >> split.txt;
done
With this way, instead of a single line with multiple arguments, we get multiple lines with a single argument. as such, we can now use var#=(grep --regexp="[pattern]" split.txt
. Now I just need to figure out how best to use regular expressions to filter this mess.
Let me take an example.
My input strings are:
RON KKND 1534Z AUTO 253985G 034SRT 134OVC 04/32
RON KKND 5256Z 143623G72K 034OVC 074OVC 134SRT 145PRT 13/00
RON KKND 2234Z CON 342523G CLS 01/M12 RMK
So the variable assignment for each of the above would be:
var1=RON var2=KKND var3=1534Z var4=TRUE var5=FALSE var6=253985G varC=2 varC1=034SRT varC2=134OVC var7=04/32
var1=RON var2=KKND var3=5256Z var4=FALSE var5=FALSE var6=143623G72K varC=4 varC1=034OVC varC2=074OVC varC3=134SRT varC4=145PRT var7=13/00
var1=RON var2=KKND var3=2234Z var4=FALSE var5=TRUE var6=342523G varC=0 var7=01/M12
So, the fourth argument might be var4, var5, or var6.
The fifth argument might be var5, var6, or match another criteria.
The sixth argument may or may not be var6. Between var6 and var7 can be determined by matching each argument with */*
Boiling this down even more, The positions on the input of var1, var2 and var3 are fixed but after that I need to compare, order, and assign. In addition, the arguments themselves can vary in character length. The relative position of each section to be divided is fixed in relation to its neighbors. var7 will never be before var6 in the input for example, and if var4 and var5 are true, then the 4th and 5th argument would always be 'AUTO CON' Some segments will always be one argument, and others more than one. The relative position of each is known. As for each pattern, some have a specific character in a specific location, and others may not have any flag on what it is aside from its position in the sequence.
So I need awk to recognize a pointer variable as every argument needs to be checked until a specific match is found
#Check to see if var4 or var5 exists. if so, flag and increment pointer
pointer=4
if (awk '{print $$pointer}' file.txt) == "AUTO" ; then
var4="TRUE"
pointer=$pointer+1
else
var4="FALSE"
fi
if (awk '{print $$pointer}' file.txt) == "CON" ; then
var5="TRUE"
pointer=$pointer+1
else
var5="FALSE"
fi
#position of var6 is fixed once var4 and var5 are determined
var6=$(awk '{print $$pointer}' file.txt)
pointer=$pointer+1
#Count the arguments between var6 and var7 (there may be up to ten)
#and separate each to decode later. varC[0-9] is always three upcase
# letters followed by three numbers. Use this counter later when decoding.
varC=0
until (awk '{print $$pointer}' file.txt) == "*/*" ; do
varC($varC+1)=(awk '{print $$pointer}' file.txt)
varC=$varC+1
pointer=$pointer+1
done
#position of var7 is fixed after all arguments of varC are handled
var7=$(awk '{print $$pointer}' file.txt)
pointer=$pointer+1
I know the above syntax is incorrect. The question is how do I fix it.
var7 is not always at the end of the input line. Arguments after var7 however do not need to be processed.
Actually interpreting the patterns I haven't gotten to yet. I intend to handle that using case statements comparing the variables with regular expressions to compare against. I don't want to use awk to interpret the patterns directly as that would get very messy. I have contemplated using for n in $string
, but to do that would mean comparing every argument to every possible combination directly (And there are multiple segments each with multiple patterns) and is such impractical. I'm trying to make this a two step process.