Bash - change image urls to base64 in html

Question

28I tried to make a script that's converting images source from normal links to base64 encoding in html files. But there is a problem: sometimes, sed tells me

script.sh: line 25: /bin/sed: Argument list too long

This is the code:

#!/bin/bash
# usage: ./script.sh file.html


mkdir images_temp

for i in `sed -n '/<img/s/.*src="\([^"]*\)".*/\1/p' $1`;

    do echo "######### download the image";
    wget -P images_temp/ $i;

    #echo "######### convert the image for size saving";
    #convert -quality 70 `echo ${i##*/}` `echo ${i##*/}`.temp;

    #echo "######### rename temp image";
    #rm `echo ${i##*/}` && mv `echo ${i##*/}`.temp `echo ${i##*/}`;

    echo "######### encode in base64";
    k="`echo "data:image/png;base64,"`$(base64 -w 0 images_temp/`echo ${i##*/}`)";

    echo "######### deletion of images_temp pictures";
    rm images_temp/*;

    echo "######### remplace string in html";
    sed -e "s|$i|$k|" $1 > temp.html;

    echo "######### remplace final file";
    rm -rf $1 && mv temp.html $1;

    sleep 5;
done;

I think the $k argument is too long for sed when the image is bigger than ~128ko; sed can't process it.

How do I make it work ?

Thank you in advance !

PS1: and sorry for the very very ugly code

PS2: or how do I do that in python ? PHP ? I'm open !

I tend never to be overly sensitive, if someone wants to edit the HTML using the `sed`, because in many situations it is possible. But this is an typical example, when it is a **very bad idea**. For this job you need an robust html-parser, because you need safely handle tags like `` and so on. Impossible with regexes... See: http://stackoverflow.com/a/1732454/632407 ;) — clt60, Jul 07 '13 at 15:33
hi jm666, thank you for your answer. I know that, but just to plead my case, it's just for 2 or 3 html files that contain 40+ images, it is not for production. Altough, the html files come from big W3C verified websites. Anyway, that just for fun, and for learning using sed ! — diywithbash, Jul 07 '13 at 15:59
Ah so... in this case, of course, you allowed to try parse context free grammar with with sed's regular expressions. Have fun :) :) — clt60, Jul 07 '13 at 16:57
@jm666 Sed is Turing complete and not at all limited to regex operations. — that other guy, Jul 07 '13 at 18:48
@thatotherguy youre probably talking about `gnu` sed. AFAIK, not the POSIX one. And this task can be done with 15 line perl script, with correct parsing. But, feel free and show an html parser written in sed ;) ;) And read the start of my 1st comment too. — clt60, Jul 07 '13 at 18:51
@jm666 No, I'm talking about POSIX sed. Which GNU features are you imagining making a difference? — that other guy, Jul 07 '13 at 18:53

score 1 · Accepted Answer · answered Jul 07 '13 at 18:58

1

Your base64 encoded image can be multiple megabytes, while the system may place a limit on the maximum length of parameters (traditionally around 128k). Sed is also not guaranteed to handle lines over 8kb, though versions like GNU sed can deal with much more.

If you want to try with your sed, provide the instructions in a file rather than on the command line. Instead of

sed -e "s|$i|$k|" $1 > temp.html;

use

echo "s|$i|$k|" > foo.sed
sed -f foo.sed "$1" > temp.html

answered Jul 07 '13 at 18:58

that other guy

116,971
11
170
194

8 hours spent searching how to raise MAX_ARG, or to use xargs, or to process the html file line by line using "cat base64 >> file.html"... and the solution was *that* simple... Joys of code ! Thank you very much ! – diywithbash Jul 07 '13 at 19:06

Bash - change image urls to base64 in html

1 Answers1

Linked