0

I want to highlight the entire row in an html file with the same color and apply the same color for the same date. Date is the first column in the html table. I have tried to write something like the below but it doesn’t work. Also am not sure how to switch the color when records have different date Code

tdDate="2020-08-24 2020-08-25 2020-08-26 2020-08-27"
tdSet=0
endTrTag="</tr>"
colors="grey blue"
for x in $tdDate
do
awk '{if (($0 ~ /$x/) & ($tdSet -eq 0)) {
sed -i 's@<td@<td bgcolor="grey"@g' 
$tdSet=1
}
elsif (($0 ~ /$endTrTag/) & ($tdSer -eq 1) {
$tdSet=0}
else {
sed -i 's@<td@<td bgcolor="grey"@g'
}}'

file
done

Sample html file


    <html>
    <table>
    <tr>
    <td>2020-08-24</td>
    <td>NYC</td>
    <td>75</td>
    <td>Sunny</td>
    </tr>
    <tr>
    <td>2020-08-24</td>
    <td>Seattle</td>
    <td>55</td>
    <td>Rainy</td>
    </tr>
    <tr>
    <td>2020-08-24</td>
    <td>Austin</td>
    <td>85</td>
    <td>Sunny</td>
    </tr>
    <tr>
    <td>2020-08-25</td>
    <td>Seattle</td>
    <td>70</td>
    <td>Sunny</td>
    </tr>
    <tr>
    <td>2020-08-25</td>
    <td>Austin</td>
    <td>95</td>
    <td>Sunny</td>
    </tr>
    <tr>
    <td>2020-08-26</td>
    <td>NYC</td>
    <td>68</td>
    <td>Rainy</td>
    </tr>
    <tr>
    <td>2020-08-26</td>
    <td>Austin</td>
    <td>95</td>
    <td>Sunny</td>
    </tr>
    <tr>
    <td>2020-08-26</td>
    <td>San Jose</td>
    <td>85</td>
    <td>Sunny</td>
    </tr>
    </table>
    </html>

Desire output


    <html>
    <table>
    <tr>
    <td bgcolor="grey">2020-08-24</td>
    <td bgcolor="grey"> NYC</td>
    <td bgcolor="grey"> 75</td>
    <td bgcolor="grey"> Sunny</td>
    </tr>
    <tr>
    <td bgcolor="grey"> 2020-08-24</td>
    <td bgcolor="grey"> Seattle</td>
    <td bgcolor="grey"> 55</td>
    <td bgcolor="grey"> Rainy</td>
    </tr>
    <tr>
    <td bgcolor="grey"> 2020-08-24</td>
    <td bgcolor="grey"> Austin</td>
    <td bgcolor="grey"> 85</td>
    <td bgcolor="grey"> Sunny</td>
    </tr>
    <tr>
    <td bgcolor="blue">2020-08-25</td>
    <td bgcolor="blue"> Seattle</td>
    <td bgcolor="blue"> 70</td>
    <td bgcolor="blue"> Sunny</td>
    </tr>
    <tr>
    <td bgcolor="blue"> 2020-08-25</td>
    <td bgcolor="blue"> Austin</td>
    <td bgcolor="blue"> 95</td>
    <td bgcolor="blue"> Sunny</td>
    </tr>
    <tr>
    <td bgcolor="grey">2020-08-26</td>
    <td bgcolor="grey"> NYC</td>
    <td bgcolor="grey"> 68</td>
    <td bgcolor="grey"> Rainy</td>
    </tr>
    <tr>
    <td bgcolor="grey"> 2020-08-26</td>
    <td bgcolor="grey"> Austin</td>
    <td bgcolor="grey"> 95</td>
    <td bgcolor="grey"> Sunny</td>
    </tr>
    <tr>
    <td bgcolor="grey"> 2020-08-26</td>
    <td bgcolor="grey"> San Jose</td>
    <td bgcolor="grey"> 85</td>
    <td bgcolor="grey"> Sunny</td>
    </tr>
    </table>
    </html>
Cyrus
  • 84,225
  • 14
  • 89
  • 153
user3423407
  • 341
  • 3
  • 13
  • IMHO, experts always advice to parse html files with tools which understand html well – RavinderSingh13 Sep 05 '20 at 12:48
  • 1
    [Don't Parse XML/HTML With Regex.](https://stackoverflow.com/a/1732454/3776858) I suggest to use an XML/HTML parser (xmlstarlet, xmllint ...). – Cyrus Sep 05 '20 at 13:07

3 Answers3

2

Assuming what you really want is each date to be a different color then with input that simple/regular I'd just do:

$ cat tst.awk
BEGIN {
    # See https://www.w3schools.com/colors/colors_names.asp
    # for all portable HTML color names, we are just using 4 here.
    numColorsAvail = split("red green blue yellow",colors)
}
/<tr>/ { tdNr=0 }
/<td>/ {
    if ( ++tdNr == 1 ) {
        date = $0
        sub(/[^>]+>[[:space:]]*/,"",date)
        sub(/[[:space:]]*<[^<]+$/,"",date)
        if ( !(date in date2color) ) {
            date2color[date] = colors[++numColorsUsed]
        }
        color = date2color[date]
    }
    sub(/>/," bgcolor=\""color"\">")
}
{ print }

.

$ awk -f tst.awk file
    <html>
    <table>
    <tr>
    <td bgcolor="red">2020-08-24</td>
    <td bgcolor="red">NYC</td>
    <td bgcolor="red">75</td>
    <td bgcolor="red">Sunny</td>
    </tr>
    <tr>
    <td bgcolor="red">2020-08-24</td>
    <td bgcolor="red">Seattle</td>
    <td bgcolor="red">55</td>
    <td bgcolor="red">Rainy</td>
    </tr>
    <tr>
    <td bgcolor="red">2020-08-24</td>
    <td bgcolor="red">Austin</td>
    <td bgcolor="red">85</td>
    <td bgcolor="red">Sunny</td>
    </tr>
    <tr>
    <td bgcolor="green">2020-08-25</td>
    <td bgcolor="green">Seattle</td>
    <td bgcolor="green">70</td>
    <td bgcolor="green">Sunny</td>
    </tr>
    <tr>
    <td bgcolor="green">2020-08-25</td>
    <td bgcolor="green">Austin</td>
    <td bgcolor="green">95</td>
    <td bgcolor="green">Sunny</td>
    </tr>
    <tr>
    <td bgcolor="blue">2020-08-26</td>
    <td bgcolor="blue">NYC</td>
    <td bgcolor="blue">68</td>
    <td bgcolor="blue">Rainy</td>
    </tr>
    <tr>
    <td bgcolor="blue">2020-08-26</td>
    <td bgcolor="blue">Austin</td>
    <td bgcolor="blue">95</td>
    <td bgcolor="blue">Sunny</td>
    </tr>
    <tr>
    <td bgcolor="blue">2020-08-26</td>
    <td bgcolor="blue">San Jose</td>
    <td bgcolor="blue">85</td>
    <td bgcolor="blue">Sunny</td>
    </tr>
    </table>
    </html>

Add a warning for numColorsUsed exceeding numColorsAvail if you like - issue a warning, set the color to "grey", reset numColorsUsed to start at the first color again, whatever you like, it's all obvious trivial stuff to handle that.

Here are all the HTML color names and how to retrieve them yourself in case you want to build it into a script:

$ curl -s https://www.w3schools.com/colors/colors_names.asp | grep -o "colARR.push('[^']*')" | cut -d\' -f2
AliceBlue
AntiqueWhite
Aqua
Aquamarine
Azure
Beige
Bisque
Black
BlanchedAlmond
Blue
BlueViolet
Brown
BurlyWood
CadetBlue
Chartreuse
Chocolate
Coral
CornflowerBlue
Cornsilk
Crimson
Cyan
DarkBlue
DarkCyan
DarkGoldenRod
DarkGray
DarkGrey
DarkGreen
DarkKhaki
DarkMagenta
DarkOliveGreen
DarkOrange
DarkOrchid
DarkRed
DarkSalmon
DarkSeaGreen
DarkSlateBlue
DarkSlateGray
DarkSlateGrey
DarkTurquoise
DarkViolet
DeepPink
DeepSkyBlue
DimGray
DimGrey
DodgerBlue
FireBrick
FloralWhite
ForestGreen
Fuchsia
Gainsboro
GhostWhite
Gold
GoldenRod
Gray
Grey
Green
GreenYellow
HoneyDew
HotPink
IndianRed
Indigo
Ivory
Khaki
Lavender
LavenderBlush
LawnGreen
LemonChiffon
LightBlue
LightCoral
LightCyan
LightGoldenRodYellow
LightGray
LightGrey
LightGreen
LightPink
LightSalmon
LightSeaGreen
LightSkyBlue
LightSlateGray
LightSlateGrey
LightSteelBlue
LightYellow
Lime
LimeGreen
Linen
Magenta
Maroon
MediumAquaMarine
MediumBlue
MediumOrchid
MediumPurple
MediumSeaGreen
MediumSlateBlue
MediumSpringGreen
MediumTurquoise
MediumVioletRed
MidnightBlue
MintCream
MistyRose
Moccasin
NavajoWhite
Navy
OldLace
Olive
OliveDrab
Orange
OrangeRed
Orchid
PaleGoldenRod
PaleGreen
PaleTurquoise
PaleVioletRed
PapayaWhip
PeachPuff
Peru
Pink
Plum
PowderBlue
Purple
RebeccaPurple
Red
RosyBrown
RoyalBlue
SaddleBrown
Salmon
SandyBrown
SeaGreen
SeaShell
Sienna
Silver
SkyBlue
SlateBlue
SlateGray
SlateGrey
Snow
SpringGreen
SteelBlue
Tan
Teal
Thistle
Tomato
Turquoise
Violet
Wheat
White
WhiteSmoke
Yellow
YellowGreen

so for example to have your script automatically use all of the portable HTML color names you could do:

awk -v htmlColors="$(curl -s https://www.w3schools.com/colors/colors_names.asp | grep -o "colARR.push('[^']*')" | cut -d\' -f2)" '
BEGIN {
   numColorsAvail = split(htmlColors,colors)
}
... rest of the script as above ...
'
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
1

With xmlstarlet and bash

#!/bin/bash

tdDate="2020-08-24 2020-08-25 2020-08-26 2020-08-27"
colors=(grey blue)               # put colors in an array

declare -i c=0                   # set integer attribute. Counter for colors

for date in $tdDate; do
  # echo "$date ${colors[$c]}"   # debug 

  xmlstarlet edit -L --insert "//html/table/tr[td='$date']/td" --type attr -n 'bgcolor' -v "${colors[$c]}" file.xml

  c=c+1
  [[ $c -eq ${#colors[@]} ]] && c=0  # reset counter if $c equal array length
done

You could certainly increase the efficiency by not rewriting the XML file for each color.


See: xmlstarlet edit --help

Cyrus
  • 84,225
  • 14
  • 89
  • 153
0

Yes, reading html can be tricky, but if it is flat, why not do it with gawk?

The colors are switching for every line in the output.

#!/bin/bash

tdDate="2020-08-24 2020-08-25 2020-08-26 2020-08-27"
tdSet=0
endTrTag="</tr>"
colors="grey blue"

gawk -v dates="$tdDate" -v colors="$colors" '
        BEGIN{ split(dates,Date); for(i in Date){ tdDate[Date[i]]=tdDate[Date[i]] };
               split(colors,color); c=1;
                FS="[\<\>]";
        }
        $3 in tdDate { c=(c==1?2:1) }
        $0~"<td>"  { gsub("<td>","<td bgcolor=\""color[c]"\">",$0); }
        1
        ' sample.html

output:

<html>
<table>
<tr>
<td bgcolor="blue">2020-08-24</td>
<td bgcolor="blue">NYC</td>
<td bgcolor="blue">75</td>
<td bgcolor="blue">Sunny</td>
</tr>
<tr>
<td bgcolor="grey">2020-08-24</td>
<td bgcolor="grey">Seattle</td>
<td bgcolor="grey">55</td>
<td bgcolor="grey">Rainy</td>
</tr>
<tr>
<td bgcolor="blue">2020-08-24</td>
<td bgcolor="blue">Austin</td>
<td bgcolor="blu.....

EDIT (Because "Been trying to print tdDate to see what values it holds"):

add this line to the awk-script:

END{ for(i in tdDate) { print "i:", i," tdDate:",tdDate[i] }}

output will be (at the end):

i: 2020-08-27  tdDate:
i: 2020-08-24  tdDate:
i: 2020-08-25  tdDate:
i: 2020-08-26  tdDate:
Luuk
  • 12,245
  • 5
  • 22
  • 33
  • Thanks. Is there a way to keep the same color for the same date? – user3423407 Sep 05 '20 at 13:56
  • Yes, change the line `$3 in tdDate { c=(c==1?2:1) }`. Currently it is switching between `1` and `2`, but you can change that to suit your needs... – Luuk Sep 05 '20 at 14:02
  • thank you for the help. Been trying to print tdDate to see what values it holds, but it throws an error stating trying to print array in scalar context. Was doing this to figure out how to alternate between dates. – user3423407 Sep 05 '20 at 19:50