0

I have a "|" delimited text file. I need to combine 2 fields and then insert this into an xml file given by another field ($5) in the same record.

awk -F "|" '{print $2$4 >> $5 }' source.txt

does this great but it just appends the data to the end of the files. I need it to replace the contents of <element> blablabla </element> which is located in each of the xml's.

Thanks in advance

  • Give a complete line so we can see how it looks like. – Jotne Sep 26 '13 at 11:59
  • the text file appears as lots of lines of `projectName|URL|string1|string2|file.xml` each record with completely unique fields, inc. different file.xml's. I need to combine **URL and string2** and put it in the relevant file.xml. This is what the awk i put above does. However in each of the file.xml's there are hundreds of elements and i need this **URL and string2** to go in the element labelled ` ` which already has data in it which needs to be removed at the same time. Hope that's clear enough. –  Sep 26 '13 at 12:44
  • Still not clear. You have the source.txt that is | separated. You like this data stored in other files? You mention different file.xml's. Give a list of files, what data to get, and where to store it, and how. – Jotne Sep 26 '13 at 13:02
  • the data is of the form `projectName|URL|string1|string2|file.xml`. i.e. `alpha|http://string/code/|1234|5678|dog.xml` `beta|http://words/text/|9876|5432|cat.xml` so for each line fields 2 and 4 need to be printed together i.e. `http://string/code/5678` and placed in file `dog.xml` in the element called ` `, which has the same name in each of the xml's. –  Sep 26 '13 at 13:36

3 Answers3

0

Untested since you didn't provide any sample inut or expected output but this should be close to what you want:

awk -v pid="$$" '
NR==FNR {
    file = $5
    f2s[file,++numSubs[file]] = $2 $4
    if ( !seen[file]++ )
        ARGV[ARGC++] = file
    next
}
{
    for (i=1; i <= numSubs[FILENAME]; i++)
        gsub(/<element>.*<\/element>/,"<element>" f2s[FILENAME,i] "</element>")
    print > (FILENAME ".mod_" pid)
}
' source.txt

for f in *.mod_$$
do
   mv -- "$f" "${f%.mod_$$}"
done

Think about what the above is doing and test it on a copy of your files before running it on your real files. IT IS UNTESTED.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
0

You can try this bash:

#!/bin/bash

while read line
do 
arr=(${line//|/ })
sed -i.bak "s#<element>.*</element>#<element>${arr[1]}${arr[3]}</element>#g" ${arr[4]}
done < 'source.txt'

Test :

sat:~# cat source.txt 
projectName|URL|string1|string2|file.xml
projectName|URL|hello1|hello2|sample.xml
sat:~#
sat:~# cat file.xml
<element>xmlcontent</element>
sat:~# 
sat:~# cat sample.xml
<element> content </element>
sat:~#
sat:~# bash sample.sh  # Executing script
sat:~#
sat:~# cat file.xml
<element>URLstring2</element>
sat:~#
sat:~# cat sample.xml
<element>URLhello2</element>
sat
  • 14,589
  • 7
  • 46
  • 65
  • Be prepared for nasal demons if/when the fields you're using from source.txt contain `#`s or other characters that the sed command will choke on. – Ed Morton Sep 26 '13 at 13:56
  • Great answer. But like Ed said, when implemented in production, it would also be good to detect failure. Check my answer for some hints as to how this might work. Note that a PROPER solution would include an XML parser, which is ... problematic ... in bash alone. – ghoti Sep 26 '13 at 14:05
  • I have found this ignores the last line of source.txt when i run it, I can just put in a blank line under it but has anyone got an adjustment to the code that would do it? –  Sep 27 '13 at 08:14
0

If I understand correctly, you want to modify each XML file in-place using data extrapolated from another file. For example, the source data might look like:

  one|fluffy|slurm|unicorns|animal.xml
  two|yellow|flarn|moons|mineral.xml
  three|blue|jalaroot|stars|mineral.xml

The the XML ... well, I don't need to provide an example. I gather you want to replace the <element> in each XML file with $2 and $4 concatenated. If this is incorrect, please clarify it in your question.

So here's an option.

#!/bin/sh

awk -F'|' '{print $5,$2$4}' source.txt | while read file data; do
  case "$data" in
   *#*) echo "ERROR: invalid data ('$data')" >&2 ;;
   *)   if [ -f "$file" ]; then
            sed -ri -e "s#<element>[^>]+</element>#<element>$data</element>#" "$file"
        else
            echo "ERROR: no such file: '$file'" >&2
        fi
        ;;
  esac
done

The idea here is that we'll take the data as a set of shell variables, $file and $data, then step through each substitution in a while loop. The substitution is done using sed "in-place" (-i). Read the man page for your sed implementation and back up your data before attempting to use this.

Note that this is actually POSIX-compatible, and doesn't require bash. (Though it will work fine in bash as well.)

PROVISOS:

  • In its current state, this fails if filenames contain whitespace.
  • If data must include other XML tags (i.e. ">" characters) then the regex in sed should be improved. (Notwithstanding the fact that you can't parse HTML with regex.)
Community
  • 1
  • 1
ghoti
  • 45,319
  • 8
  • 65
  • 104
  • This will fail if any of the relevant fields from source.txt contains a backslash since it's missing the `-r` arg for read. Also since source.txt is `|`-separated I wouldn't assume that fields can't contain white space so use `|` as the OFS for awk and the IFS for shell. You can get rid of the test on `$data` by using awk for the substitution instead of sed. – Ed Morton Sep 26 '13 at 15:29
  • 1
    Excellent points. Of course, without knowing the input data, we can't know if these issues are practical or just theoretical. Another issue that both our answers face is that they can't handle ``s that span multiple lines. My answer, like yours, is The Wrong Way To Do It. – ghoti Sep 26 '13 at 15:58
  • 1
    Agreed, just trying to make the wrong way a more robust way as I suspect the OP is not going to do it the right way! – Ed Morton Sep 26 '13 at 16:45