I tried all kinds of logic and methods and even googled a lot, but yet not able to think of any satisfactory answer for the question I have. I have wrote a program as shown below to highlight specific xml code where I am facing some problem. Sorry for making this post bit long. I only wanted to clearly explain my problem.
EDIT: For running below given program you will need two xml files which are here: sample1 and sample2. Save this files and in below code edit the location where you want to save your files in C:/Users/editThisLocation/Desktop/sample1.xml
from lxml import etree
from collections import defaultdict
from collections import OrderedDict
from distutils.filelist import findall
from lxml._elementpath import findtext
from Tkinter import *
import Tkinter as tk
import ttk
root = Tk()
class CustomText(tk.Text):
def __init__(self, *args, **kwargs):
tk.Text.__init__(self, *args, **kwargs)
def highlight_pattern(self, pattern, tag, start, end,
regexp=True):
start = self.index(start)
end = self.index(end)
self.mark_set("matchStart", start)
self.mark_set("matchEnd", start)
self.mark_set("searchLimit", end)
count = tk.IntVar()
while True:
index = self.search(pattern, "matchEnd","searchLimit",
count=count, regexp=regexp)
if index == "": break
self.mark_set("matchStart", index)
self.mark_set("matchEnd", "%s+%sc" % (index, count.get()))
self.tag_add(tag, "matchStart", "matchEnd")
def Remove_pattern(self, pattern, tag, start="1.0", end="end",
regexp=True):
start = self.index(start)
end = self.index(end)
self.mark_set("matchStart", start)
self.mark_set("matchEnd", start)
self.mark_set("searchLimit", end)
count = tk.IntVar()
while True:
index = self.search(pattern, "matchEnd","searchLimit",
count=count, regexp=regexp)
if index == "": break
self.mark_set("matchStart", index)
self.mark_set("matchEnd", "%s+%sc" % (index, count.get()))
self.tag_remove(tag, start, end)
recovering_parser = etree.XMLParser(recover=True)
sample1File = open('C:/Users/editThisLocation/Desktop/sample1.xml', 'r')
contents_sample1 = sample1File.read()
sample2File = open('C:/Users/editThisLocation/Desktop/sample2.xml', 'r')
contents_sample2 = sample2File.read()
frame1 = Frame(width=768, height=25, bg="#000000", colormap="new")
frame1.pack()
Label(frame1, text="sample 1 below - scroll to see more").pack()
textbox = CustomText(root)
textbox.insert(END,contents_sample1)
textbox.pack(expand=1, fill=BOTH)
frame2 = Frame(width=768, height=25, bg="#000000", colormap="new")
frame2.pack()
Label(frame2, text="sample 2 below - scroll to see more").pack()
textbox1 = CustomText(root)
textbox1.insert(END,contents_sample2)
textbox1.pack(expand=1, fill=BOTH)
sample1 = etree.parse("C:/Users/editThisLocation/Desktop/sample1.xml", parser=recovering_parser).getroot()
sample2 = etree.parse("C:/Users/editThisLocation/Desktop/sample2.xml", parser=recovering_parser).getroot()
ToStringsample1 = etree.tostring(sample1)
sample1String = etree.fromstring(ToStringsample1, parser=recovering_parser)
ToStringsample2 = etree.tostring(sample2)
sample2String = etree.fromstring(ToStringsample2, parser=recovering_parser)
timesample1 = sample1String.findall('{http://www.example.org/eHorizon}time')
timesample2 = sample2String.findall('{http://www.example.org/eHorizon}time')
for i,j in zip(timesample1,timesample2):
for k,l in zip(i.findall("{http://www.example.org/eHorizon}feature"), j.findall("{http://www.example.org/eHorizon}feature")):
if [k.attrib.get('color'), k.attrib.get('type')] != [l.attrib.get('color'), l.attrib.get('type')]:
faultyLine = [k.attrib.get('color'), k.attrib.get('type'), k.text]
def high(event):
textbox.tag_configure("yellow", background="yellow")
limit_1 = '<p1:time nTimestamp="{0}">'.format(5) #limit my search between timestamp 5 and timestamp 6
limit_2 = '<p1:time nTimestamp="{0}">'.format((5+1)) # timestamp 6
highlightString = '<p1:feature color="{0}" type="{1}">{2}</p1:feature>'.format(faultyLine[0],faultyLine[1],faultyLine[2]) #string to be highlighted
textbox.highlight_pattern(limit_1, "yellow", start=textbox.search(limit_1, '1.0', stopindex=END), end=textbox.search(limit_2, '1.0', stopindex=END))
textbox.highlight_pattern(highlightString, "yellow", start=textbox.search(limit_1, '1.0', stopindex=END), end=textbox.search(limit_2, '1.0', stopindex=END))
button = 'press here to highlight error line'
c = ttk.Label(root, text=button)
c.bind("<Button-1>",high)
c.pack()
root.mainloop()
What I want
If you run above code, it would present an output given below:
As you can see in the image, I only intend to highlight code marked with green tick. Some of you might think of limiting the starting and ending index to highlight that pattern. However, if you see in my program I am already making use of starting and ending indexes for limiting my output to only nTimestamp="5"
and for that I am using limit_1
and limit_2
variables.
So in this type of a data how to correctly highlight one pattern out of many inside individual nTimestamp
?
EDIT: Here I specifically want to highlight 3rd item in nTimestamp="5"
because this item is not present in sample2.xml
as you can see in two xml files and when program runs it also differentiates this. The only problem is to highlight correct item which is 3rd in my case.
I am using highlighting class from Bryan Oakley's code here
EDIT Recent
In context to what kobejohn asked below in comments, the target file won't ever be empty. There are always chances that target file may have extra or missing elements. Finally, my current intention is to highlight only deep elements which are different or missing and timestamps
in which they are located. However, highlighting of timestamps
is done correctly but the issue to highlight deep elements like explained above is still an issue. Thank you kobejohn for clarifying this.
NOTE:
One method which I know and you might suggest that works correctly is to extract the index of green color ticked pattern and simply run highlight tag over it, but this approach is very hard-coded and in large data where you have to deal with lots of variations it is completely ineffective. I am searching for another better option.