I have a Unicode/UTF-8 text file from some third-party Windows software that contains about ten columns of data.
The header line is tab-delimited. However, the remaining lines are space-delimited (not tab-delimited!) (as seen when opening the file in Notepad++ or TextWrangler).
Here are the first four lines of the file (as an example): x y z(ns) z(cm) z-abs(cm) longitude- E latitude- N type_of_object description 728243.03 5993753.83 0 0 0 143.537779835969 -36.1741232463362 linestart DRIVEWAYGRAVEL 728242.07 5993756.02 0 0 0 143.537768534943 -36.1741037476109 line DRIVEWAYGRAVEL 728242.26 5993756.11 0 0 0 143.537770619485 -36.1741028922293 linestart DRIVEWAYGRAVEL
x y z(ns) z(cm) z-abs(cm) longitude- E latitude- N type_of_object description
728243.03 5993753.83 0 0 0 143.537779835969 -36.1741232463362 linestart DRIVEWAYGRAVEL
728242.07 5993756.02 0 0 0 143.537768534943 -36.1741037476109 line DRIVEWAYGRAVEL
728242.26 5993756.11 0 0 0 143.537770619485 -36.1741028922293 linestart DRIVEWAYGRAVEL
(n.b. the space at the start of each line except for the header line)
I'm trying to write a Bash script to reformat the data for import into a different Windows program.
(I realise I could do this on the Windows command line, but I have no experience with it, so would prefer to copy the file onto my Debian machine and create a script in Bash. This means the input file and output file need to be compatible with Windows, but the script itself is obviously running in Linux.)
I need to do the following:
- Extract the first two columns (x and y coordinates) but ONLY for lines containing "rectangle" in the second-last column, using a comma delimiter.
- Add either a 1 or a 0 at the end of each line. The first line should have a 1, the 2-4th lines should have a 0, the 5th line should have a 1, 6-8th lines should have a 0, and so on. That is, every fourth line (starting at the first line) should have a 1, and every other line should have a 0.
So the output file should look something like this:
728257.89,5993759.24,1
728254.83,5993758.54,0
728251.82,5993762.4,0
728242.45,5993765.07,0
I have tried the answer to this question. e.g.
awk '
NR==1{
for(i=1;i<=NF;i++)
if($i!="z(ns)")
cols[i]
}
{
for(i=1;i<=NF;i++)
if(i in cols)
printf "%s ",$i
printf "\n"
}' input.file > output.file
...to remove the third column (and then variations on this to get rid of the other unwanted columns). However, all I'm left with is just an empty output file.
I also tried hacking together a solution with grep and awk:
touch output.txt
count=0
IFS=$'\n'
set -f #disable globbing
for i in $( grep "rectangle" $inputFile )
do
Xcoord=$(awk 'BEGIN { FS=" " } { print $1 }' $i )
printf "$Xcoord" >> output.txt
echo ","
Ycoord=$(awk 'BEGIN { FS=" " } { print $2 }' $i )
printf "$Ycoord" >> output.txt
printf ","
count=$((count+1))
if [[ count = "1" ]]
then
printf "$count\n" >> output.txt
else
printf "0\n" >> output.txt
fi
done
set +f #re-enable globbing for future use of the terminal.
...the idea behind this was: -For each line in $inputFile that contains "rectangle"
1. Append the first column (variable "Xcoord") to output.txt
2. Append a comma to output.txt
3. Append the second column (variable "Ycoord") to output.txt
4. Append another comma to output.txt
5. Append the 1 or 0 as per the if test based on the value of the variable "count", along with a new line.
This idea fails. Instead of saving the data to the file, it prints all columns of the file to stdout, with the first column replaced with the text "(No such file or directory)":
...and output.txt is just full of zeros:
- How can I fix this?
- Do I need to do anything to make the resulting output.txt file Windows-format?
Thanks in advance...