0

I have complex requirement where:

1)I need to scan directory 1 continuously and extract an element from a list of xmls.

2)based on the element, check if the file(element) is present in the directory 2

3)if present then copy the xml file to directory 2

4)continue this loop

XML Sample:

<?xml Version="1.0" encoding="UTF-8" standalone="no"?>
<Data xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <Main>
    <Rec>Bank</Rec>
  </Main>
  <Code>124</Code>
  <City></City>
  <CompCodes>
    <CompCode>US</CompCode>
    <Vend>13</Vend>
    <File_name>abc.txt</File_name>
  </CompCodes>
  <BankData>
    <Code>123</Code>
    <BankAcctNum>231</BankAcctNum>
  </BankData>
  <BankData>
    <Code>124</Code>
    <BankAcctNum>431</BankAcctNum>
  </BankData>
</Data>

I tried with below script but it is not doing what it supposed to do:

#!/usr/bin/env bash
dir_list1=(
/data/test/
)

search_for_files() {
        local dir=$(cd "$1"; pwd)
        local target=/data/test2/
        shopt -s nullglob
        for file in "$dir"/*xml; do
                pdf=$(grep -oPm1 "(?<=<File_name>)[^<]+" <<< "$file")
                #base=${file%.*}
                #base=${base##*/}
                if [-d "$target/$pdf" ]; then
                        cp $dir/$file  $target 
                fi
      done
}

for file in "${dir_list1[@]}"; do
        search_for_files "$dir"
done

Appreciate any help!

John
  • 105
  • 9
  • Try pasting your code into shellcheck.net – Mark Setchell Jun 04 '20 at 09:35
  • `[-d "$target/$pdf" ]` If it's a file, you should use `[-f "$target/$pdf" ]` – Philippe Jun 04 '20 at 10:00
  • Some call it [summoning the daemon](https://www.metafilter.com/86689/), others refer to it as [the Call for Cthulhu](https://blog.codinghorror.com/parsing-html-the-cthulhu-way/) and few just [turned mad and met the Pony](https://stackoverflow.com/a/1732454/8344060). In short, never parse XML or HTML with a regex! Did you try an XML parser such as `xmlstarlet`, `xmllint` or `xsltproc`? – kvantour Jun 04 '20 at 13:17

1 Answers1

1

There have been multiple mistakes in the script, below is a corrected version:
I've commented out the erronous lines and put corrected lines underneath them

#!/usr/bin/env bash
dir_list1=(
/data/test/
)

search_for_files() {
    local dir=$(cd "$1"; pwd)
    local target=/data/test2/
    shopt -s nullglob
    for file in "$dir"/*xml; do
            #pdf=$(grep -oPm1 "(?<=<File_name>)[^<]+" <<< "$file")
            pdf=$(grep -oPm1 "(?<=<File_name>)[^<]+" < "$file")
            #base=${file%.*}
            #base=${base##*/}
            #if [-d "$target/$pdf" ]; then
            if [ -f "$target/$pdf" ]; then
                    #cp $dir/$file  $target 
                    cp "$file" "$target" 
            fi
  done
}

#for file in "${dir_list1[@]}"; do
for dir in "${dir_list1[@]}"; do
    search_for_files "$dir"
done

In order to satisfy your requirement #1 ("scan directory 1 continuously"), you may have a look at the tool "watch".

lab9
  • 596
  • 2
  • 8