1

I'm trying to scrape bing images using bulk-bing-image-downloader. I have a csv file that contains keywords and folder names in which I want the images to be saved:

keyword,folder,search
dog's house,animal,1
book.end,read,0
key chains,house,1

I'd like to use the values under keyword and folder as arguments to search and download images, and the value under search as a conditional statement, where if it is 1, then the code performs the search, but not if it is 0. The basic bulk-bing-image-downloader code is:

./bbid.py -s "keyword" --limit 10 --adult-filter-off -o "folder"

where keyword and folder is where I'd like to loop through each row in the csv file. I currently have the bash command set up as, but I'm super new to shell commands and have zero idea how the awk works..help please?:

awk '
BEGIN {
    -F,
    FPAT = "([^,]+)|(\"[^\"]+\")"
}
{
  if ($1 != "keyword") {
    printf("%s\n", $1)
    ./bbid.py -s $1 --limit 10 --adult-filter-off -o $1
  }
}
' test.csv
Dai
  • 51
  • 5

1 Answers1

1

Since you mentioned you have zero idea how awk works - get the book "Effective AWK Programming", 5th Edition, by Arnold Robbins and it will teach you how to use AWK. The most important thing for you to understand given the command you posted, though, is this: awk is not shell. Awk and shell are 2 completely different tools with completely different purposes and their own syntax, semantics, and scope. Awk is a tool for manipulating text while shell is a tool for creating/destroying files and processes and sequencing calls to tools. Awk is the tool that the people who invented shell also invented for shell to call when necessary to manipulate text.

This shell script might be what you're trying to do:

while IFS=',' read -r k f _; do
    echo ./bbid.py -s "$k" --limit 10 --adult-filter-off -o "$f"
done < <(tail -n +2 file)
./bbid.py -s dog's house --limit 10 --adult-filter-off -o animal
./bbid.py -s book.end --limit 10 --adult-filter-off -o read
./bbid.py -s key chains --limit 10 --adult-filter-off -o house

Remove the echo when you're done with initial testing.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • Thank you! This code is very promising--the issue now is that, the argument "$f" creates a folder with the quotation included. When I replace "$f" with just $f, the generated folder name is always followed by a question mark. Is there a way to circumvent this? Additionally, this might be my code issue..but it's not reading in the very last row..any ideas why? – Dai Sep 27 '21 at 20:42
  • The problems are all in your data, not in my code. I'd bet the fields in your real data are actually within double quotes, unlike the fields in your example, and maybe there's some other weird characters in there, idk. Whatever you do, do NOT leave `$f` unquoted as no matter what is wrong, that is the wrong solution, see https://mywiki.wooledge.org/Quotes. – Ed Morton Sep 27 '21 at 22:12
  • Thanks -- this really helped! I saw a similar [comment](https://stackoverflow.com/questions/12916352/shell-script-read-missing-last-line) and updated the last bit of the code to (tail -n +2 name_code.csv | grep "") and it now works perfectly! – Dai Sep 28 '21 at 01:31
  • You're welcome. I don't know what that other question might have to do with this one, nor what it has to do with piping to `grep ""`, nor do I understand how adding a pipe to `grep ""` solved any problem but - glad it's working for you! – Ed Morton Sep 28 '21 at 12:25