I have a CSV export that I need to map to new values to in order to then import into a different system. I am using ArangoDB to create this data migration mapping.
Below is the full script used:
#!/bin/bash
execute () {
filepath=$1
prefix=$2
keyField=$3
filename=`basename "${filename%.csv}"`
collection="$prefix$filename"
filepath="/data-migration/$filepath"
# Check for "_key" column
if ! xsv headers "$1" | grep -q _key
# Add "_key" column using the keyfield provided
then
xsv select $keyField "$1" | sed -e "1s/$keyField/_key/" > "$1._key"
xsv cat columns "$1" "$1._key" > "$1.cat"
mv "$1.cat" "$1"
rm "$1._key"
fi
# Import CSV into Arango Collection
docker exec arango arangoimp --collection "$collection" --type csv "$filepath" --server.password ''
}
# This single line runs the execute() above
execute 'myDirectory/myFile.csv' prefix_ OLD_ORG_ID__C
So far I've deduced the $keyField
(OLD_ORG_ID__C
) parameter passed to the execute()
function, is used in the loop of the script. This looks for $keyField
column and then migrates the values to a newly created _key
column using the XSV toolkit.
OLD_ORG_ID__C | _key
A123 -> A123
B123 -> B123
-> ## <-auto populate
Unfortunately not every row has a value for the OLD_ORG_ID__C
column and as a result the _key
for that row is also empty which then causes the import to Arango to fail.
Note: This _key
field is necessary for my AQL scripts to work properly
How can I rewrite the loop to auto-index the blank values?
then
xsv select $keyField "$1" | sed -e "1s/$keyField/_key/" > "$1._key"
xsv cat columns "$1" "$1._key" > "$1.cat"
mv "$1.cat" "$1"
rm "$1._key"
fi
Is there a better way to solve this issue? Perhaps xsv sort
by the keyField
and then auto populate the from the blank rows to the end?
UPDATE: Per the comments/answer I tried something along these lines but so far still not working
#!/bin/bash
execute () {
filepath=$1
prefix=$2
keyField=$3
filename=`basename "${filename%.csv}"`
collection="$prefix$filename"
filepath="/data-migration/$filepath"
# Check for "_key" column
if ! xsv headers "$1" | grep -q _key
# Add "_key" column using the keyfield provided
then
awk -F, 'NR==1 { for(i=1; i<=NF;++i) if ($i == "'$keyField'") field=i; print; next }
$field == "" { $field = "_generated_" ++n }1' $1 > $1-test.csv
fi
}
# import a single collection if needed
execute 'agas/Account.csv' agas_ OLD_ORG_ID__C
This creates a Account-test.csv file but unfortunately it does not have the "_key" column or and changes to the OLD_ORG_ID__C
values. Preferably I would only want to see the "_key" values populated with auto-numbered values when OLD_ORG_ID__C
is blank, otherwise they should copy the provided value.