0

I'd like to replace some file extension in an SQL file when I match strings from an input file using terminal.

I have an input.txt containing a list of file paths.

/2014/02/haru-sushi_copertina_apdesign-300x300.png 
/2014/02/haru-sushi_copertina_apdesign.png 
/2014/02/harusushi_01_apdesign-300x208.png
ect ect

Then I have a WordPress.sql file

What I'd like to do, whenever I find a match between the 2 files, is to replace the extension from .png to .jpg in the database file of that matching. I hope I've made myself clear.

Should I use sed with regular expressions? Something like

cat input.txt | while read -r a; do sed -i 's/$a/.jpg/g' wordpress.sql; done 

Any suggestions? Even for the RegEx.

Tomm
  • 1,021
  • 2
  • 14
  • 34
Giacomo Scarpino
  • 599
  • 3
  • 17

2 Answers2

1

I would suggest two steps:

Step 1 Create a sed script from the input.txt, that contains a list of all substitutions:

sed -r "s/(([^.]*)\.[^ ]+)[ ]*/s#\1#\2.jpg#g;/g" input.txt > input.sed

This creates lines s#png-filename#jpg-filneme#g;

  • the funny part \.[^ ]+)[ ]* strips possible trailing spaces in your input.txt
  • the original line (minus trailing spaces) gets captured into \1
  • the original line up to the after the first . gets captured into \2
  • a substitution command is build with \1 and \2.png

Step2 Apply the generated input.sed script to your wordpress.sql file:

sed -f input.sed wordpress.sql > new_wordpress.sql

Depending on the number of lines in your input.txt that might or might not be faster than your read-loop. Because there are only two incantations of sed (but with a much larger number of commands).

Lars Fischer
  • 9,135
  • 3
  • 26
  • 35
1

sed is for simple substitutions on individual lines, that is all, and you should never write a shell loop just to manipulate text, see http://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice.

Try this (uses GNU awk which I assume you have since you were using GNU sed):

awk -i inplace 'NR==FNR { paths[$0]; next }
{
    for (path in paths) {
        gsub(path,gensub(/png$/,"jpg",1,path))
    }
    print
}
' input.txt wordpress.sql

It has some caveats related to partial matching but no worse than if you were trying to use sed and easily fixable if there's a problem (unlike with sed).

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • I've never used awk.. So I'm learning from "man" and examples on the net.. Anyway in mac terminal "-i inplace" doesn't work I guess for this reason https://stackoverflow.com/questions/24332942/why-awk-script-does-not-work-on-mac-os-but-works-on-linux.. – Giacomo Scarpino Dec 27 '17 at 13:49
  • 1
    That is a terrible way to try to learn awk since literally about 95% of examples on the net are complete nonsense. Just get the book Effective Awk Programming, 4th Edition, by Arnold Robbins. If `-i` works as shown in your question then you have GNU sed and so should also have GNU awk and so `-i inplace` will work unless you have a very old version. Get a newer version of GNU awk or just do `awk '...' input.txt wordpress.sql > tmp && mv tmp wordpress.sql` which is what gawk, sed, perl, etc. do behind the scenes anyway for "inplace editing". – Ed Morton Dec 27 '17 at 14:03
  • Nothing to do man.. I've installed gawk for Mac but the script empties both wordpress.sql and input.txt – Giacomo Scarpino Dec 27 '17 at 17:40
  • Just do what I suggested as the workaround in my comment. – Ed Morton Dec 27 '17 at 18:27
  • It's taking ages, is that possible? Wordpress.sql almost 10MB and input.txt has 1300 files paths.. – Giacomo Scarpino Dec 29 '17 at 09:54
  • Yes, absolutely it could take ages as it's looping 1300 times for each line of Wordpress.sql. When you have very large input files you've GOT to state that in your question for us to stand a chance of being able to factor that into a proposed solution – Ed Morton Dec 29 '17 at 11:08
  • I start to understand what awk does reading this [http://www.delorie.com/gnu/docs/gawk/gawk_toc.html](http://www.delorie.com/gnu/docs/gawk/gawk_toc.html). Script completed but it empties both files again.. So I've tried this command `gawk -i inplace 'NR==FNR { paths[$0]; next } { for (path in paths) { gsub(path,gensub(/png$/,"jpg",1,path)) } print }' input.txt wordpress.sql > tmp && mv tmp wordpress2.sql` but it empties input.txt and wordpress2.sql.. something still wrong – Giacomo Scarpino Dec 29 '17 at 12:49
  • The point of the tmp file is so you DON'T need `-i inplace` so get rid of that. – Ed Morton Dec 29 '17 at 14:39
  • I highly recommend you try this on small files first to get it working rather than running it on your huge files and so waiting for a long time to discover if there's a problem. – Ed Morton Dec 29 '17 at 14:53
  • 1
    It works.. I tried this ` gawk 'NR==FNR { paths[$0]; next } { for (path in paths) { gsub(path,gensub(/png$/,"jpg",1,path)) } print }' input.txt wordpress.sql > tmp.sql ` – Giacomo Scarpino Dec 29 '17 at 16:03