Use grep xargs sed to regenerate UUIDs in a file more effeciently

Question

I successfully am able to replace UUIDs with freshly generated UUIDDs in a file:

FILE=/home/username/sql_inserts_with_uuid.sql
grep -i -o -E "([a-f0-9]{8}-[a-f0-9]{4}-4[a-f0-9]{3}-[89aAbB][a-f0-9]{3}-[a-f0-9]{12})" $FILE | xargs -I {} sed -i "s/{}/`uuidgen -t`/g" $FILE

But its slow because it rewrites the file for each UUID it generates. Is there a more efficient way to rewrite every UUID in a single pass instead of rewriting the same file over and over?

Save this sample data in a file to test:

INSERT INTO fake_table (uuid) VALUES ('812ab76e-43ca-11ec-b54f-00d8617c2296');
INSERT INTO fake_table (uuid) VALUES ('854f7b36-43ca-11ec-9608-00d8617c2296');
INSERT INTO fake_table (uuid) VALUES ('8a09444a-43ca-11ec-8ae2-00d8617c2296');
INSERT INTO fake_table (uuid) VALUES ('8cd0da58-43ca-11ec-9811-00d8617c2296');
INSERT INTO fake_table (uuid) VALUES ('8f9889c0-43ca-11ec-8bfc-00d8617c2296');

This regex `"([a-f0-9]{8}-[a-f0-9]{4}-4[a-f0-9]{3}-[89aAbB][a-f0-9]{3}-[a-f0-9]{12})"` does not match your sample. — dawg, Nov 12 '21 at 15:27
generate a batch of uuid's, place in a file, pass 'uuid' file and old file to `awk` to perform the replacements; you'll need to capture output to a temp file and when done overwrite the old file with the temp file; alternatively dump the series of `sed` scripts into a script file then pass to `sed -f`; not sure of benefit of using parallel operations to update the same target file (race conditions? intermingled output?); seems like the concurrent reads/writes (same file) will be slower than using a single process; said single process should aim for single read/write of file — markp-fuso, Nov 12 '21 at 15:28

dawg · Accepted Answer · 2021-11-12T16:35:07.427

You can use awk with a system call to replace them all in one pass:

awk '
BEGIN{pat="[a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[0-9][a-fA-F0-9]{3}-[89aAbB][a-fA-F0-9]{3}-[a-fA-F0-9]{12}"}
function get_uuid(){
    cmd = "uuidgen"
    cmd | getline uuid
    close(cmd)
    return uuid
}

$0~pat{     
    uuid=get_uuid()
    sub(pat,uuid,$0)
} 1
' file.txt

Prints:

INSERT INTO fake_table (uuid) VALUES ('473C4331-CC31-4FD0-AE99-37FA7E5F23CF');
INSERT INTO fake_table (uuid) VALUES ('EBEC05AB-4236-4384-AF7A-76D4A0615599');
INSERT INTO fake_table (uuid) VALUES ('23740143-6CC1-41FC-8AE7-038810291026');
INSERT INTO fake_table (uuid) VALUES ('7DBF25AF-4E85-4C55-B8CA-0F6150D5DD3C');
INSERT INTO fake_table (uuid) VALUES ('4365127B-EB46-414E-92D4-B48CC211489E');

With GNU awk, you can make the replacements inplace. Otherwise, you need to redirect the output of this to a temp file then mv the temp file on top of the source file. This sounds harder than is actually is.

Speed test: Multiplying your example file to 10,000 UUID replacements, the file is processed in 21 seconds on my computer and 26 ms if the same file has no replacements. The system call is not free in terms of efficiency but this is likely faster than what you are doing...

*With GNU awk, you can make the replacements inplace* Per your link: "The `inplace` extension emulates GNU sed’s -i option, which performs “in-place” editing of each input file. ..." If it's like GNU sed's "in-place" option, it's not actually in-place. It's just creating a temp file and renaming it. So you can't depend in an unchanged inode or hard links remaining correct. — Andrew Henle, Nov 13 '21 at 13:32

score 1 · Answer 2 · answered Nov 12 '21 at 20:25

1

In plain bash:

cat new_uuids

#!/bin/bash

hex='[[:xdigit:]]'
hex3="$hex$hex$hex"
hex4="$hex3$hex"
hex8="$hex4$hex4"
hex12="$hex8$hex4"
pat="$hex8-$hex4-[0-9]$hex3-[89aAbB]$hex3-$hex12"

while IFS= read -r line; do
    if [[ $line = *$pat* ]]; then
        echo "${line/$pat/$(uuidgen -t)}"
    else
        echo "$line"
    fi
done

Call it as

./new_uuids < sql_inserts_with_uuid.sql > new_sql_inserts_with_uuid.sql

answered Nov 12 '21 at 20:25

M. Nejat Aydin

9,597
1
7
17

I was writing something similar (but didn't know about `[[:xdigit:]]`). You don't need the test in the `while` loop, just `echo`ing the replacement is enough. the speed is around 9sec on my computer from 2012. – Fravadona Nov 12 '21 at 20:38
@Fravadona Correct, but, in that case, the `uuidgen` would be called even if there wasn't a replacement. This may or may not be desirable, depending on the ratio of matching/non-matching lines. – M. Nejat Aydin Nov 12 '21 at 20:49

Use grep xargs sed to regenerate UUIDs in a file more effeciently

2 Answers2