I am trying to pre-process UTF-8 CSV files, to create SQL script to create tables in a postgresql DB. Headers come from Excel and are human optimized:
Ex:
GKCode,GKCode Neu,Name,Markttyp,Jahr,Leerstandsrate Büro (MFG) (%),Spitzenmiete Büro City (Euro/qm),Durchschnittsmiete Büro City (Euro/qm),Nettoanfangsrendite Büro zentrale Lagen (%),Nettoanfangsrendite Büro dezentrale Lagen (%),Nettoabsorption (qm)
What I want to do:
get the headers (first line
remove BOM if any
prepend doublequotes after and append doublequotes and ' varchar' before each delimiter
prepend doublequotes at the beggining of stream
append doublequotes and ' varchar' at the end of stream (last varchar replaced with '### bad append ###' for test purpose
What did I try so far:
tried to use a file as intermediate
tried to use echo and AWK
my code (best possible example so far):
csv_delimiter=","
head -1 raw_import |sed 's/^\xEF\xBB\xBF//'| sed -e "s|$csv_delimiter|\" varchar$csv_delimiter\"|g"| sed -e "s|$|\" ### bad append ###|g"
I receive unexpected result, with 'bad append' being in wrong place:
GKCode" varchar,"GKCode Neu" varchar,"Name" varchar,"Markttyp" varchar,"Jahr" varchar,"Leerstandsrate Büro (MFG) (%)" varchar,"Spitzenmiete Büro City (Euro/qm)" varchar,"Durchschnittsmiete Büro City (Euro/qm)" varchar,"Net" ### bad append ###o zentrale Lagen (%)" varchar,"Nettoanfangsrendite Büro dezentrale Lagen (%)" varchar,"Nettoabsorption (qm)
results are stable with simple headers (alphanumeric)
echo "1,wqd15,15,dq5w15,d5qw,4dq,51dqw5d,q51d,qw51d,qw51d,5q,wd,56" |sed '1s/^\xEF\xBB\xBF//'| sed -e "s|$csv_delimiter|\" varchar$csv_delimiter\"|g"| sed -e "s|$|\" varchar|g"