0

I am trying to automate a procedure where the system will fetch the contents of a file (1 Url per line), use wget to grab the files from the site (https folder) and then remove the line from the file.

I have made several tries but the sed part (at the end) cannot understand the string (I tried escaping characters) and remove it from that file!

cat File
https://something.net/xxx/data/Folder1/
https://something.net/xxx/data/Folder2/
https://something.net/xxx/data/Folder3/

My line of code is:

cat File | xargs -n1 -I @ bash -c 'wget -r -nd -l 1 -c -A rar,zip,7z,txt,jpg,iso,sfv,md5,pdf --no-parent --restrict-file-names=nocontrol --user=test --password=pass --no-check-certificate "@" -P /mnt/USB/ && sed -e 's|@||g' File'

It works up until the sed -e 's|@||g' File part..

Thanks in advance!

tfonias74
  • 136
  • 9
  • If you're going to read the whole file, why remove its content line by line? couldn't you just `data=$(cat File); echo -n > File` ? – Aaron Mar 10 '17 at 13:33
  • For more advanced cases you might want to consider using `flock`. – Aaron Mar 10 '17 at 13:48

4 Answers4

2

Dont use cat if it's posible. It's bad practice and can be problem with big files... You can change

cat File | xargs -n1 -I @ bash -c 

to

for siteUrl in $( < "File" ); do

It's be more correct and be simpler to use sed with double quotes... My variant:

scriptDir=$( dirname -- "$0" )
for siteUrl in $( < "$scriptDir/File.txt" )
do
    if [[ -z "$siteUrl" ]]; then break; fi # break line if him empty
    wget -r -nd -l 1 -c -A rar,zip,7z,txt,jpg,iso,sfv,md5,pdf --no-parent --restrict-file-names=nocontrol --user=test --password=pass --no-check-certificate "$siteUrl" -P /mnt/USB/ && sed -i "s|$siteUrl||g" "$scriptDir/File.txt"
done
beliy
  • 445
  • 6
  • 13
  • Thanks for the reply! How can I get it to update the File itself? I test it with 4 lines (3 not valid and 1 valid). I get on the screen the desired output (4 lines with a gap between them, the valid is missing) but the file remains unchanged.. – tfonias74 Mar 13 '17 at 06:49
  • I have tried switching to `sed -i "|$siteUrl|d"` but I get `sed: -e expression #1, char 1: unknown command: `|'`. – tfonias74 Mar 13 '17 at 07:24
  • there is also another issue, when the URL contains spaces this code brakes it down as different elements.. – tfonias74 Mar 13 '17 at 11:11
  • sorry, i dont test code and fast change u line. For update file need use `-i` for sed. Use `sed -i "s|$siteUrl||g"` or `sed -i "\|$siteUrl|d"` Are you can give me 1 problem URL for test? – beliy Mar 14 '17 at 10:14
1

I believe you just need to use double quotes after sed -e. Instead of:

'...&& sed -e 's|@||g' File'

you would need

'...&& sed -e '"'s|@||g'"' File'
zeehio
  • 4,023
  • 2
  • 34
  • 48
1

@beliy answers looks good!

If you want a one-liner, you can do:

while read -r line; do \
wget -r -nd -l 1 -c -A rar,zip,7z,txt,jpg,iso,sfv,md5,pdf \
--no-parent --restrict-file-names=nocontrol --user=test \
--password=pass --no-check-certificate "$line" -P /mnt/USB/ \
&& sed -i -e '\|'"$line"'|d' "File.txt"; \
done < File.txt

EDIT: You need to add a \ in front of the first pipe

Community
  • 1
  • 1
jraynal
  • 507
  • 3
  • 10
  • Thanks for the reply! After the do it required an extra space.. I run it and while it tries to download the test locations I added (3 false and 1 good) when it reaches the sed part I get: sed: -e expression #1, char 1: unknown command: `|' – tfonias74 Mar 13 '17 at 06:41
  • Good point, you need a `\` in front of the first `|` apparently, I didn't know that! Thanks! – jraynal Mar 13 '17 at 12:01
  • Correct and if you change `sed -e with sed -i` it is exactly what I was searching for ;) – tfonias74 Mar 14 '17 at 11:42
0

I see what you trying to do, but I dont understand the sed command including pipes. Maybe some fancy format that I dont understand.

Anyway, I think the sed command should look like this...

sed -e 's/@//g'

This command will remove all @ from the stream.
I hope this helps!

Mario
  • 679
  • 6
  • 10