Assumptions:
- generate
$durchlauf
(a number) random line numbers; we'll refer to a single number as n
...
- delete lines numbered
n
and n+1
from the input file and in their place ...
- insert
$string
(a randomly generated base64
string)
- this list of random line numbers must not have any consecutive line numbers
As others have pointed out you want to limit yourself to a single gawk
call per input file.
New approach:
- generate
$durchlauf
(count) random numbers (see gen_numbers()
function)
- generate
$durchlauf
(count) base64
strings (we'll reuse Ed Morton's code)
paste
these 2 sets of data into a single input stream/file
- feed 2 files to
gawk
... the paste
result and the actual file to be modified
- we won't be able to use
gawk
's -i inplace
so we'll use an intermediate tmp file
- when we find a matching line in our input file we'll 1) insert the
base64
string and then 2) skip/delete the current/next input lines; this should address the issue where we have two random numbers that are different by +1
One idea to insure we do not generate consecutive line numbers:
- break our set of line numbers into ranges, eg, 100 lines split into 5 ranges =>
1-20
/ 21-40
/ 41-60
/ 61-80
/ 81-100
- reduce the end of each range by 1, eg,
1-19
/ 21-39
/ 41-59
/ 61-79
/ 81-99
- use
$RANDOM
to generate numbers between each range (this tends to be at least a magnitude faster than comparable shuf
calls)
We'll use a function to generate our list of non-consecutive line numbers:
gen_numbers () {
max=$1 # $zeilen eg, 100
count=$2 # $durchlauf eg, 5
interval=$(( max / count )) # eg, 100 / 5 = 20
for (( start=1; start<max; start=start+interval ))
do
end=$(( start + interval - 2 ))
out=$(( ( RANDOM % interval ) + start ))
[[ $out -gt $end ]] && out=${end}
echo ${out}
done
}
Sample run:
$ zeilen=100
$ durchlauf=5
$ gen_numbers ${zeilen} ${durchlauf}
17
31
54
64
86
Demonstration of the paste/gen_numbers/base64/tr/gawk
idea:
$ zeilen=300
$ durchlauf=3
$ paste <( gen_numbers ${zeilen} ${durchlauf} ) <( base64 /dev/urandom | tr -dc '[[:print:]]' | gawk -v max="${durchlauf}" -v RS='.{230}' '{print RT} FNR==max{exit}' )
This generates:
74 7VFhnDN4J...snip...rwnofLv
142 ZYv07oKMB...snip...xhVynvw
261 gifbwFCXY...snip...hWYio3e
Main code:
tmpfile=$(mktemp)
while/for loop ... # whatever OP is using to loop over list of input files
do
zeilen=$(wc -l < "testfile${filecount}".txt)
durchlauf=$(( $zeilen/20 ))
awk '
# process 1st file (ie, paste/gen_numbers/base64/tr/gawk)
FNR==NR { ins[$1]=$2 # store base64 in ins[] array
del[$1]=del[($1)+1] # make note of zeilen and zeilen+1 line numbers for deletion
next
}
# process 2nd file
FNR in ins { print ins[FNR] } # insert base64 string?
! (FNR in del) # if current line number not in del[] array then print the line
' <( paste <( gen_numbers ${zeilen} ${durchlauf} ) <( base64 /dev/urandom | tr -dc '[[:print:]]' | gawk -v max="${durchlauf}" -v RS='.{230}' '{print RT} FNR==max{exit}' )) "testfile${filecount}".txt > "${tmpfile}"
# the last line with line continuations for readability:
#' <( paste \
# <( gen_numbers ${zeilen} ${durchlauf} ) \
# <( base64 /dev/urandom | tr -dc '[[:print:]]' | gawk -v max="${durchlauf}" -v RS='.{230}' '{print RT} FNR==max{exit}' ) \
# ) \
#"testfile${filecount}".txt > "${tmpfile}"
mv "${tmpfile}" "testfile${filecount}".txt
done
Simple example of awk
code in action:
$ cat orig.txt
line1
line2
line3
line4
line5
line6
line7
line8
line9
$ cat paste.out # simulated output from paste/gen_numbers/base64/tr/gawk
1 newline1
5 newline5
$ awk '...' paste.out orig.txt
newline1
line3
line4
newline5
line7
line8
line9
to:mark-fuso: I copy it from a other posting. it's to hard to understand awk for a small job. "loop change same fileprint" is a copy error - not from me. i will delete it – kumpel4 Sep 19 '21 at 08:48