1

I need to parse the content of a CSV file, that changes on its content, without any advise about. Therefore, sometimes I'll receive a file with this content:

"No,Latitude,Longitude,Name,Altitude,Satellites,Date,Time"

and some others with this..

"No,Latitude,Longitude,Name,Altitude,Date,Time"

or maybe other (always with the same content.. but sometimes 5 columns.. some others 6 columns... etc. I mean.. sometimes I miss Name.. or Satellites... or both)

Therefore.. I need to read and understand what is inside of each column for elaborating a fixed CSV with (always) the same content:

"No,Latitude,Longitude,Name,Altitude,Satellites,Date,Time"

In order to do that I was thinking on:

  1. Read the header and assign some values to each column:

IFS=",";read -r a[1] a[2] a[3] a[4] a[5] a[6] a[7] a[8] a[9] a[10] a[11]

This is for reading the header

Same process for the header I need to create (the one I know):

IFS=",";read -r b[1] b[2] b[3] b[4] b[5] b[6] b[7] <<< "No,Latitude,Longitude,Name,Altitude,Satellites,Date,Time"

  1. Run a loop for each non empty values of the A array.. trying to match any of the headers of B array
do

for x in [1..7]

do

if [[ ${a[i]} = ${b[x]} ]] 

then

# The column i from my source file belongs to the column x of my output file 

output[$x]=$i

fi

done

done
  1. Store this on my new file.. with something like this.
IFS=","; while read -r a[1] a[2] a[3] a[4] a[5] a[6] a[7] a[8] a[9] a[10] a[11]

do
 
echo "${a[output[1]]}, ${a[output[2]]},${a[output[3]]},${a[output[4]]},${a[output[5]]},${a[output[6]]},${a[output[7]]}" >> new.csv

done < source.csv

I think this should work since it will record the appropriated source column into the desired final column....

But.. I'm sure that there are many ways easier and more elegant for doing this..

Any help?

Thanks

alvgarci
  • 35
  • 6
  • if name and satellites are both missing, as you say, then where are you reading their values? – jhnc Jun 12 '22 at 01:20
  • what's the ultimate objective ... add a null/empty field for 'missing' columns, with the intention being to end up with a final csv that has, what, 7 total columns? – markp-fuso Jun 12 '22 at 02:04
  • Are the column headings valid shell variable names? If so, KamilCuk's answer to ["How to process all or selected rows in a csv file where column headers and order are dynamic?"](https://stackoverflow.com/questions/55895328#55899356) (mostly the first part) might work (though I'd recommend using a plain redirect, instead of piping from `cat`). Also see rici's answer to ["Use array elements as names of the variables i want to read to from a file"](https://stackoverflow.com/questions/50729979#50730881). – Gordon Davisson Jun 12 '22 at 03:34
  • the main idea is not produce a truncated CSV. I need that in any case latitude, longitude and time there, but the rest.. are not really relevant. If one of those values are missing, i can paste a 0 o a null. – alvgarci Jun 12 '22 at 11:25
  • Thanks for the comments.. I'll review KamilCuk's answer – alvgarci Jun 12 '22 at 11:26
  • it would help if you provided a sample input file and the expected output – markp-fuso Jun 12 '22 at 14:10

1 Answers1

2

Miller possesses the template verb for exactly this kind of task:

mlr template [options]
     Places input-record fields in the order specified by list of column names.
     If the input record is missing a specified field, it will be filled with the fill-with.
     If the input record possesses an unspecified field, it will be discarded.
Options:
     -f {a,b,c} Comma-separated field names for template.
     -t {filename} CSV file whose header line will be used for template.
     --fill-with {filler string} What to fill absent fields with. Defaults to the empty string.

So you could do:

mlr --csv template \
    -f 'No,Latitude,Longitude,Name,Altitude,Satellites,Date,Time' \
    --fill-with '0' \
    file.csv

Or use the header of an other file as reference:

mlr --csv template -t header.csv --fill-with '0' file.csv
Fravadona
  • 13,917
  • 1
  • 23
  • 35