4

I want to replace all occurrences of a number with a random number in each line of a file using "sed". For example, if my file has the number 892 in each line, I would like to replace that with a unique random number between 800 and 900.

Input file:-

temp11;djaxfile11;892  
temp12;djaxfile11;892  
temp13;djaxfile11;892  
temp14;djaxfile11;892  
temp15;djaxfile11;892

Expected output file :-

temp11;djaxfile11;805  
temp12;djaxfile11;846  
temp13;djaxfile11;833  
temp14;djaxfile11;881  
temp15;djaxfile11;810

I am trying the below:-

sed -i -- "s/;892/;`echo $RANDOM % 100 + 800 | bc`/g" file.txt

but it is replacing all the occurrences of 892 with a single random number between 800 and 900.

Output file :-

temp11;djaxfile11;821  
temp12;djaxfile11;821  
temp13;djaxfile11;821  
temp14;djaxfile11;821  
temp15;djaxfile11;821

Could you please help in correcting my code ? Thanks in advance.

Uri Agassi
  • 36,848
  • 14
  • 76
  • 93
  • Must you absolutely do it in sed? It would be easy in Python or PERL. – smci Mar 29 '15 at 13:01
  • So your file never has more than 101 lines in it, correct? And the number isn't actually random since it is at least partially determined by the previous lines? – Mark Setchell Mar 29 '15 at 14:19
  • My file actually has thousands of records. The sed suggestion that Wintermute gave is working perfectly, although it is taking a bit of time. Is awk faster from a performance point of view ? any thoughts ? – Abhilipsa Mehra Mar 29 '15 at 15:15

1 Answers1

8

With GNU sed, you could do something like

sed '/;892$/ { h; s/.*/echo $((RANDOM % 100 + 800))/e; x; G; s/892\n// }' filename

...but it would be much saner to do it with awk:

awk -F \; 'BEGIN { OFS = FS } $NF == 892 { $NF = int(rand() * 100 + 800) } 1' filename

To make sure that the random numbers are unique, amend the awk code as follows:

awk -F \; 'BEGIN { OFS = FS } $NF == 892 { do { $NF = int(rand() * 100 + 800) } while(!seen[$NF]++) } 1'

Doing that with sed would be too crazy for me. Be aware that this will only work only if there are less than 100 lines with a last field of 892 in the file.

Explanation

The sed code reads

/;892$/ {                              # if a line ends with ;892
  h                                    # copy it to the hold buffer
  s/.*/echo $((RANDOM % 100 + 800))/e  # replace the pattern space with the
                                       # output of echo $((...))
                                       # Note: this is a GNU extension
  x                                    # swap pattern space and hold buffer
  G                                    # append the hold buffer to the PS
                                       # the PS now contains line\nrandom number
  s/892\n//                            # remove the old field and the newline
}

The awk code is much more straightforward. With -F \;, we tell awk to split the lines at semicolons, then

BEGIN { OFS = FS }  # output field separator is input FS, so the output
                    # is also semicolon-separated
$NF == 892 {        # if the last field is 892
                    # replace it with a random number
  $NF = int(rand() * 100 + 800)
}
1                   # print.

The amended awk code replaces

$NF = int(rand() * 100 + 800)

with

do {
  $NF = int(rand() * 100 + 800)
} while(!seen[$NF]++)

...in other words, it keeps a table of random numbers it has already used and keeps drawing numbers until it gets one it hasn't seen before.

Wintermute
  • 42,983
  • 5
  • 77
  • 80
  • Thanks a lot ! I tried the sed code you suggested and it worked fine. I will try out the awk options and explore which is the fastest & best way to do it. – Abhilipsa Mehra Mar 29 '15 at 15:16
  • Answered my own (now deleted) questions: (1) awk arrays will accept string keys so this should work for string replacements and (2) if you're getting odd behavior and using the `system` call, it may be returning the status code and printing (not returning) the output. – claytond Jul 31 '19 at 18:20