1

Background: I am running unit tests and one requires calling a PSQL function with a high number of URLs (ie. 2000+). and this is extremely slow as shown in this Minimal Working Example (MWE)

MWE:

 #!/bin/bash                                                                 
                                                                             
 # Generate random 4096 character alphanumeric                                                                                                                                                    
 # string (upper and lowercase)                                    
 URL="http://www.$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w $((4096-15)) | head -n 1).com"
                                                                             
 # Create a comma separated list of 2000 URLs                                
 for i in $(seq 2000)                                                        
 do                                                                          
     URLS="$URLS,$URL"                                                       
 done

We call it and measure the run time like so

$ time ./generate_urls.sh 

real    1m30.681s
user    1m14.648s
sys     0m16.000s

Question: Is there a faster, more efficient way to achieve this same result?

  • Do you really want all the urls to be the same? – choroba May 26 '21 at 16:26
  • @choroba I have two tests: one to ensure that duplicate URLs do *not* result in an error (ie. 5 repetitions are technically valid) and one to ensure that past a maximum length (ie. 1999) they *do* result in an error. For performance reasons I don't want 2000 random URL generations – francistheturd May 26 '21 at 16:28

2 Answers2

1

Instead of concatenating over and over, just print them all and store the result.

URLS=$(
    for i in $(seq 2000) ; do
        printf %s, "$URL"
    done
)
echo "${URLS%,}"  # Remove the final comma.

Takes less than 2 secs on my machine. Even when I move the URL generation inside the loop, it takes just about 8 secs.

choroba
  • 231,213
  • 25
  • 204
  • 289
1

If you always want 2000 URLs then this code is much faster than the code in the question:

# Create a comma separated list of 2000 (5*5*5*4*4) URLs
urls=$url,$url,$url,$url,$url               # x 5
urls=$urls,$urls,$urls,$urls,$urls          # x 5
urls=$urls,$urls,$urls,$urls,$urls          # x 5
urls=$urls,$urls,$urls,$urls                # x 4
urls=$urls,$urls,$urls,$urls                # x 4

See Correct Bash and shell script variable capitalization for an explanation of why I changed the variable names to lowercase.

pjh
  • 6,388
  • 2
  • 16
  • 17