One table space and tab separated, and need to separate the fields by semicolon, have tried with awk directly but didn't work. Taking one perl script to do this work with tables with ASCII style pipe separated and underscore, can't do it if I don't have some of this stuff to do the same job.
Name full CI FG AG DG Date (UTC) Virnia Ray
34842865 093161455 - - 2019-07-12T12:09:31.378Z Vitoxia Sureez
40151215 094063155 36.3 - 2019-07-14T13:18:11.733Z
Already tried
sed -e 's/^[ t]*//' -e 's/ /\;/g'
to remove all the spaces
Perl script maded by L. Scott originally to convert table ASCII-styled
while(<>) {
@vals = split / /; # split fields into the val array taking space separator
$size = @vals;
for( $i = 0 ; $i < $size ; $i++ )
{
#clean up the values: remove underscores and extra spaces in the fields and remove possible semicolons there
$vals[$i] =~ s/_/ /g;
$vals[$i] =~ s/;/ /g;
$vals[$i] =~ s/^ *//;
$vals[$i] =~ s/ *$//;
# append the value to the data record for this field
$data[$i] .= $vals[$i];
# special handling for first field: use spaces when joining
$data[$i] .= " " if ($i==0); #do not know if this is necessary to the new requirement as we have space in more than the first field.
}
if(/\R/) # Taking carriage return as the end of record
{
# clean up the first record; trim spaces
$data[0] =~ s/^ *//;
$data[0] =~ s/ *$//;
$data[3] =~ s/\..*//; # remove the point and decimal for the field four
# join the records with semicolons
$line = join (";", @data);
# collapse multiple spaces
$line =~ s/ +/ /g;
# print this line and start over
print "$line\n" unless ($line eq '');
@data = ();
} }
Expecting:
Name full;CI;FG;AG;DG;Date (UTC)
Virnia Ray;34842865;093161455;-;-;2019-07-12T12:09:31.378Z
Vitoxia Sureez;40151215;094063155;36;-;2019-07-14T13:18:11.733Z
Current output:
Name;full;;;;;;;;;;;;;;;;;;CI;;;;;;;FG;;;AG;DG;Date;(UTC)
Virnia;Ray;;;;;;;;;;;;;;;;;;;34842865;093161455;-;;;;-;;;;;2019-07-1T12:09:31.378Z
Vitoxia;Sureez;;;;;;;;;;;;;;;;;;40151215;094063155;36;;;-;;;;;2019-07-14T13:18:11.733Z
I have some cases with the first field are like:
Mar▒a Xatia Mecrdiz
M▒ndrz, yrcr▒a
cdcsurtmz at ruy opdx
lxtrb mxs2axs rl tsactfg
re xorts tdz drfod t 33743642 095518568 41 - 2019-06-12T13:48:40.200Z
zude def rtexetggacvc
opyxo ae f▒xuda tcso
dxzdtctfgs ti x9mdfggfhh
sx 7dfgab, asvro oi sz op
dgeto jxgdmszdd.
I only need in the first field the data before the comma, all after this will be drop. As you can see the "line" of the rest of the data in the row are not in the same line..
The original data come from one HTML code parsed by html2text the original code is:
<b>Mon Jul 05 2019</b><hr><table style="border: 1px solid
#dddddd;border-collapse: collapse;text-align: left;"><tr><th style="padding: 8px;background-color: #cce6ff">Name Full</th><th style="padding: 8px;background-color: #cce6ff">FG</th><th style="padding: 8px;background-color: #cce6ff">CG</th><th style="padding: 8px;background-color: #cce6ff">AG</th><th style="padding: 8px;background-color: #cce6ff">MG</th><th style="padding: 8px;background-color: #cce6ff">Date (UTC)</th><tr><th style="padding: 8px;background-color: #dddddd">Mrída Xatia Mecrdiz Míndrz, yrcrría cdcsurtmz at ruy opdxlxtrb mxs2axs rl tsactfgre xorts tdz drfod t zude def rtexetggacvcopyxo ae féxuda tcsodxzdtctfgs ti x9mdfggfhhsx 7dfgab, asvro oi sz op
dgeto jxgdmszdd.</th><th style="padding: 8px;background-color: #dddddd">33743642</th><th style="padding: 8px;background-color: #dddddd">095518568</th><th style="padding: 8px;background-color: #dddddd">41</th><th style="padding: 8px;background-color: #dddddd">-</th><th style="padding: 8px;background-color: #dddddd">2019-05-12T13:48:40.200Z</th></tr><tr><th style="padding: 8px;">Cdlga foxa</th><th style="padding: 8px;">45285726</th><th style="padding: 8px;">092641968</th><th style="padding: 8px;">28</th><th style="padding: 8px;">-</th><th style="padding: 8px;">2019-06-11T13:50:52.091Z</th></tr></table>
Maybe there some util to use instead html2text here to do this work in a better shape directly from the render tool.
Here the html table with more records.
<b>Mon Jul 05 2019</b><hr>
<table style="border: 1px solid #dddddd;border-collapse: collapse;text-align: left;"><tr>
<th style="padding: 8px;background-color: #cce6ff">Name Full</th>
<th style="padding:8px;background-color: #cce6ff">FG</th>
<th style="padding: 8px;background-color: #cce6ff">CG</th>
<th style="padding: 8px;background-color: #cce6ff">AG</th>
<th style="padding: 8px;background-color: #cce6ff">MG</th>
<th style="padding: 8px;background-color: #cce6ff">Date (UTC)</th></tr>
<tr><th style="padding: 8px;">Mrída Xatia Mecrdiz Míndrz, yrcrría cdcsurtmz at ruy opdxlxtrb mxs2axs rl tsactfgre xorts tdz drfod t zude def rtexetggacvcopyxo ae féxuda tcsodxzdtctfgs ti x9mdfggfhhsx 7dfgab, asvro oi sz op dgeto jxgdmszdd.</th>
<th style="padding: 8px;">33743642</th>
<th style="padding: 8px;">095518568</th><th style="padding: 8px;">41</th><th style="p
adding: 8px;">-</th><th style="padding: 8px;">2019-05-12T11:47:01.240Z</th></tr>
<tr><th style="padding: 8px;background-color: #dddddd">Cdlga foxa</th>
<th style="padding: 8px;background-color: #dddddd">45285726</th>
<th style="padding: 8px;background-color: #dddddd">092641968</th>
<th style="padding: 8px;background-color: #dddddd">28</th>
<th style="padding: 8px;background-color: #dddddd">-</th>
<th style="padding: 8px;background-color: #dddddd">2019-06-11T11:48:51.806Z</th></tr>
<tr><th style="padding: 8px;">Qrala Xera</th>
<th style="padding: 8px;">33184756</th>
<th style="padding: 8px;">032178032</th>
<th style="padding: 8px;">-</th>
<th style="padding: 8px;">-</th>
<th style="padding: 8px;">2019-03-01T11:55:04.269Z</th></tr>
<tr><th style="padding: 8px;background-color: #dddddd">Mpa Fagun;Mor@asd. Prq*yqesla, LEllal4331</th>
<th style="padding: 8px;background-color: #dddddd">54324252</th>
<th style="padding: 8px;background-color: #dddddd">034021061</th>
<th style="padding: 8px;background-color: #dddddd">-</th>
<th style="padding: 8px;background-color: #dddddd">-</th>
<th style="padding: 8px;background-color: #dddddd">2019-04-12T11:58:15.349Z</th></tr>
<tr><th style="padding: 8px;">xOpàr '00083</th>
<th style="padding: 8px;">13702194</th>
<th style="padding: 8px;">197071330</th>
<th style="padding: 8px;">40.2</th>
<th style="padding: 8px;">-</th>
<th style="padding: 8px;">2019-07-15T12:00:28.617Z</th></tr>
<tr><th style="padding: 8px;background-color: #dddddd">Drlia >·xa1otta</th>
<th style="padding: 8px;background-color: #dddddd">34253138</th>
<th style="padding: 8px;background-color: #dddddd">394995572</th>
<th style="padding: 8px;background-color: #dddddd">68</th>
<th style="padding: 8px;background-color: #dddddd">-</th>
<th style="padding: 8px;background-color: #dddddd">2019-07-12T12:32:19.793Z</th></tr>