0

I want to remove specific words from a txt file in bash. Here is my current script:

echo "Sequenzia Import Tag Sidecar Processor v0.2"
echo "=============================================================="
rootfol=$(pwd)
echo "Selecting files from current folder........"
images=$(ls *.jpg *.jpeg *.png *.gif)
echo "Converting sidecar files to folders........"
for file in $images
do
    split -l 8 "$file.txt" tags-
    for block in tags-*
    do
                foldername=$(cat "$rootfol/$block" | tr '\r\n' ' ')
                FOO_NO_EXTERNAL_SPACE="$(echo -e "${foldername}" | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//')"
                mkdir "$FOO_NO_EXTERNAL_SPACE" > /dev/null
                cd "$FOO_NO_EXTERNAL_SPACE"
        done
        mv "$rootfol/$file" "$file"
        cd "$rootfol"
        rm tags-* $file.txt
done
echo "DONE! Move files to import folder"

What it does is read the txt file that is named the same as a image and create folders that are interpreted as tags during a import into a Sequenzia image board (based in myimoutobooru) (https://code.acr.moe/kazari/sequenzia). What i want to do is remove specific words (actually there symbol combinations) from the sidecar file so that they do not cause issues with the import process.

Combinations like ">_<" and ":o" i want to remove from the file.

What can i add that allows me do this with a list of illegal words considering my current script.

  • Please provide a bit more detail on what you've tried and why it's not been successful. – Joe C Nov 16 '16 at 20:23

2 Answers2

0

You can create a file which lists out your illegal strings and iterate through the lines of the file, using regex to remove each one from your input like this.

Community
  • 1
  • 1
Eric M.
  • 642
  • 5
  • 16
  • Well i don't want to remove all the symbols, i want to remove a list of combinations or words because that would mess with other valid lines of the file. – konata_fan337 Nov 16 '16 at 19:58
0

Before the line "split -l 8 "$file.txt" tags-" I suggest you clean up the $file.txt using something like:

sef -f sedscript <"$file.txt" >tempfile

sedscript is a file that you create beforehand containing all your unwanted strings, e.g.

s/>_<//g
s/:o//g

You'd change your split command to use tempfile.

Experimenting with stdin/stdout on my PC suggests that multiple matches in a sed script are executed in the same pass over the input file. Therefore is the file is large, this appraoch avoids reading the file multiple times.

another variant of this approach is:

sed -e s/>_<//g -e s/:o//g <infile >outfile

repeat the

-e s/xxx//g

option as many times as required.

fidgety
  • 16
  • 1
  • By the way, this kind of scripting might be easier in Perl. Perl was made to do this kind of thing. Bash has to call a bunch of external programs such as sed. – fidgety Nov 16 '16 at 20:30
  • That seems to kind of work, when i ran it it seemed to removed almost every character and left only a few letters over. Here is my script https://code.acr.moe/kazari/sequenzia/snippets/2 – konata_fan337 Nov 16 '16 at 20:47
  • You are nearly there. Some characters in a sed script are "special" and need to be escaped with a backslash. So, where you have s/...//g this will delete any sequence of three characters - dot being a wildcard. This link has more details "http://unix.stackexchange.com/questions/32907/what-characters-do-i-need-to-escape-when-using-sed-in-a-sh-script". – fidgety Nov 16 '16 at 20:52
  • To summarise: "Sed uses basic regular expressions. In a BRE, the characters $.*[\]^ need to be quoted by preceding them by a backslash". So your sed command s/...//g would need to be s/\.\.\.//g to remove three dots. – fidgety Nov 16 '16 at 20:54
  • Ok that looks to have worked cant fully test it because split seems to not be outputting anything for some reason, here is my current script https://code.acr.moe/kazari/sequenzia/snippets/3 – konata_fan337 Nov 16 '16 at 21:10
  • Does tempfile now contain the expected cleaned-up file, with all unwanted strings removed ? If so then you at least know the problem is beyond that "sed" line of code. – fidgety Nov 16 '16 at 21:28
  • yes, it works now, i spelled the directory wrong at line 4 – konata_fan337 Nov 16 '16 at 21:50