3

I'm looking to replace characters at specific byte offsets.

Here's what is provided: An input file that is simple ASCII text. An array within a Bash shell script, each element of the array is a numerical byte-offset value.

The goal: Take the input file, and at each of the byte-offsets, replace the character there with an asterisk.

So essentially the idea I have in mind is to somehow go through the file, byte-by-byte, and if the current byte-offset being read is a match for an element value from the array of offsets, then replace that byte with an asterisk.

This post seems to indicate that the dd command would be a good candidate for this action, but I can't understand how to perform the replacement multiple times on the input file.

Input file looks like this:

00000
00000
00000

The array of offsets looks this:

offsetsArray=("2" "8" "9" "15")

The output file's desired format looks like this:

0*000
0**00
00*00

Any help you could provide is most appreciated. Thank you!

Community
  • 1
  • 1
stotrami
  • 159
  • 1
  • 1
  • 11

3 Answers3

4

Please check my comment about about newline offset. Assuming this is correct (note I have changed your offset array), then I think this should work for you:

#!/bin/bash

read -r -d ''
offsetsArray=("2" "8" "9" "15")
txt="${REPLY}"
for i in "${offsetsArray[@]}"; do
    txt="${txt:0:$i-1}*${txt:$i}"
done
printf "%s" "$txt"

Explanation:

  • read -d '' reads the whole input (redirected file) in one go into the $REPLY variable. If you have large files, this can run you out of memory.
  • We then loop through the offsets array, one element at a time. We use each index i to grab i-1 characters from the beginning of the string, then insert a * character, then add the remaining bytes from offset i. This is done with bash parameter expansion. Note that while your offsets are one-based, strings use zero-based indexing.

In use:

$ ./replacechars.sh < input.txt
0*000
0**00
00*00
$ 

Caveat:

This is not really a very efficient solution, as it causes the sting containing the whole file to be copied for every offset. If you have large files and/or a large number of offsets, then this will run slowly. If you need something faster, then another language that allows modification of individual characters in a string would be much better.

Digital Trauma
  • 15,475
  • 3
  • 51
  • 83
  • Boy, Im impressed. Im still struggling to understand `0**00`. But +1 for this great piece of work! – jaypal singh Apr 19 '14 at 21:22
  • 1
    Wow, that's like Elven magic! I don't know how that works, but it certainly does. I took this code and plugged it into a script that was already being written. Changed the `read` line to `read -r -d '' < /path/to/input/file', and changed the `printf` line to `printf "%s" "$txt" > /path/to/output/file` ... and viola, works like a charm! Thank you! – stotrami Apr 19 '14 at 21:51
  • @stotrami I am with you. His solution is giving me trauma digitally. `;)` – jaypal singh Apr 19 '14 at 21:52
  • @DigitalTrauma Could you take a minute and explain what's going on in the line `txt="${txt:0:10#$i-1}*${txt:10#$i}"` please? – stotrami Apr 19 '14 at 22:10
  • the `10#` is actually not needed, it's the same as `txt="${txt:0:$i-1}*${txt:$i}"` - which creates a new string by taking the part from `0` to `$i-1`, then an `*` and then the rest from index `$i`. [reference: bash parameter espansion](http://www.gnu.org/software/bash/manual/bashref.html#Shell-Parameter-Expansion) – mata Apr 19 '14 at 22:19
  • @mata Yes - I added them while debugging and forgot to take them out. I've edited them out now. – Digital Trauma Apr 19 '14 at 22:53
3

The usage of dd can be a bit confusing at the time, but it's not that hard:

outfile="test.txt"

# create some test data
echo -n 0123456789abcde > "$outfile"

offsetsArray=("2" "7" "8" "13")
for offset in "${offsetsArray[@]}"; do
    dd bs=1 count=1 seek="$offset" conv=notrunc of="$outfile" <<< '*'
done

cat "$outfile"

Important for this example is to use conv=notrunc, otherwise dd truncates the file to the length of blocks it seeks over. bs=1 specifies that you want to work with blocks of size 1, and seek specifies the offset to satart writing count blocks to.

The above produces 01*3456**9abc*e

mata
  • 67,110
  • 10
  • 163
  • 162
  • +1 for addressing the use of `dd`. The explanation is immensely helpful :) Also this works great, thank you! – stotrami Apr 19 '14 at 21:57
  • `dd` is sometimes referred to as the Swiss Army knife of unix utilities. I prefer to think of it as a Swiss Army knife with a chainsaw attachment. You can very easily do serious damage to data on your disk without even knowing it. That being said, this answer looks good and should be much faster than mine for large files. +1 – Digital Trauma Apr 19 '14 at 23:00
  • @DigitalTrauma That's a good analogy, and a wise precaution to keep in mind. Although this option works, I was forced to not use it because I could not figure out how to suppress `dd`'s operations messages from going to stdout. Normally I would redirect to null, but the syntax provided in this example does not appear to allow that. Maybe there's another workaround? – stotrami Apr 20 '14 at 00:35
  • `dd bs=1 count=1 seek="$offset" conv=notrunc of="$outfile" <<< '*' 2>/dev/null` works fine for me... (`dd` writes messages to stderr, not stdout) – mata Apr 20 '14 at 08:57
2

With the same offset considerations as @DigitalTrauma's superior solution, here's a GNU awk-based alternative. This assumes your file contains no null bytes

(IFS=','; awk -F '' -v RS=$'\0' -v OFS=''  -v offsets="${offsetsArray[*]}" \
'BEGIN{split(offsets, o, ",")};{for (k in o)  $o[k]="*"; print}' file)

0*000
0**00
00*00
iruvar
  • 22,736
  • 7
  • 53
  • 82
  • `awk` could be nice for certain reasons. Is there a way to include the array via variable name into this awk command? – stotrami Apr 19 '14 at 21:52