I have fixed width delimited file as follows
aaaaa003aaaaaaaaaaaaaaa
bbbbb002aaaaaaaaaa
ccccc004cccccccccccccccccccc
I need to get it in the form
aaaaa003aaaaa
aaaaa003aaaaa
aaaaa003aaaaa
bbbbb002aaaaa
bbbbb002aaaaa
ccccc004ccccc
ccccc004ccccc
ccccc004ccccc
ccccc004ccccc
My current script is in efficient for 11 million lines. How can I optimise this?
#!/bin/sh
# My first Script
echo "Unbulking"
IN=$1
OUT=$2
while IFS= read -r line;do
HEAD=${line:0:8}
BODY=$(echo $line | sed -r ’s/.{8}//‘)
BODYVAR=$(echo $BODY |fold -w 5)
for i in ${BODYVAR}
do
echo $HEAD$i >> $OUT
done
done < $IN
echo "Completed"
My logic needs to be along the lines:
#take the first 8 characters of a line and assign to a str1
#take the last 3 characters of str1 and cast to a intger and assign to num1
#multiply num1 by 5 and assign to num2
#return the substring from char 8 to num2 and assign to str2
#cut str2 into chunks of 5 and assign to an array arr1
#concatenate str1 with each element of arr1
#return the arr1 as a set of new lines
#repeat for everyline in the file