0

Pre-Introduction

l am a beginner in Tkinter, l learn mostly from seeing videos(Youtube) and searching up problems according to what l need. My Python knowledge is until Object oriented but very rusty. And English isn't my mother languague Stopwords are words which are often encountered in texts and prove little to no use so they must be deleted. The program contains Greek characters

Introduction

My main goal for my Tkinter program is for the user, to load a corpus(a text from a file in which it later gets tokenize), load a stopwords file(preferable the output would be in a list, from txt to list). Then the program, for each word inside the corpus(text), checks the stopwords(list) if a stopword is contained inside the corpus. If it is, the specific stopword gets added into a listbox(let's call it "Found stopwords",A). For example, out of all 265 stopwords, a corpus might contain 65 of them, then those 65 should be inserted into the Found List Box.

The user has the choice to move words into the second ListBox(to be deleted stopwords, B) in which this ListBox will contain the stopwords that will get deleted after the user presses a menu button called "Remove stopwords". The program will check the ListBox B for each stopword found, it will get deleted in corpus text and not just once. For example: if the ListBox B has the word "Στο" and if the corpus contains this word 7 times, then that word should be deleted 7 times and this applies to every word that is in the ListBox.

Main Problem

The main problems are the logic behind how ListBox B if it does the job right of finding the stopwords to delete in function: removeStopwords

def removeStopwords(self):
        if len (self.corpusText) == 0: # corpusText is just a String
            print("Empty")
            return

        temp=[]## Used for debugging and seeing which words get added
        for word in self.listBoxOfDelStopwords.get(0,self.listBoxOfDelStopwords.size()):
            self.updatedStopwords_Greek.append(word)
            print("Words to be added:"+word)
            #if word in self.updatedStopwords_Greek:
                #temp.append(word)
                #print(word)
                
        #print(temp)
        print("Total of Stopwords words:"+ str(len(self.updatedStopwords_Greek)))

        print("Before removal of stopwords from Text :"+str(len(self.corpusText)))

        for word in self.corpusText:
            if word  in self.updatedStopwords_Greek:
                temp.append(word)
                print(temp)## Debugging reasons
                print(str(len(temp)))## Number of occurences not just for one stopword
                self.corpusText.remove(word)

                
        print("-----")
        print("After Removal of stopwords from Text : "+str(len(self.corpusText))) 
                                        #Shows number of letters
                                
        self.textBoxTest.delete("1.0",END)
        self.textBoxTest.insert("1.0",self.corpusText)
        self.labelNumOfChars.configure(text=str(len(self.corpusText)))
        #self.textBoxTest.insert("1.0",self.corpusTest)
        #print(self.corpusTest)

But the most critical problem is finding a way to highlight the words to be deleted in red within the Text Widget or whatever that fits best to handle this problem. l have read up about Text Widget and its tag system but l sadly did not grasp enough. l am not sure if the tag logic can work on it. Is it possible to make a function or just the logic of it, that highlights the words inside the Text Box widget*(*or label or whatever that fits best)? In other words: l want each time to move an item from ListBox A to ListBox B, to update the Text Box widget by coloring the words to be deleted if the user chooses to remove them. If words "του" and "τα" are in ListBox B, then whatever occurence of these words are encountered in Text widget, should be colored with Red color. Then should be back to the original color if they get removed from ListBox B back to ListBox A. This way, it will give the user a glimpse of what he will delete(and if the system actually works and deletes the selected words)

Last Notes

Obviously; if there are more efficient ways of dealing such a problem without using the methods l used, it would be appreciated to be mentioned and expanded on. My code is surely a mess and l would like to be told like l am a beginneer on the methods that will be solve the problems if possible. l sadly do not know how StackOverflow works, l doubt l can upload the Python file, onlyway l see is to copy the entire code but that would complicate the situation, right?

Bryan Oakley
  • 370,779
  • 53
  • 539
  • 685
  • _" Is it possible to make a function or just the logic of it, that highlights the words inside the Text Box widget"_ - yes. See [How to highlight text in a tkinter text widget](https://stackoverflow.com/questions/3781670/how-to-highlight-text-in-a-tkinter-text-widget) – Bryan Oakley Dec 20 '22 at 23:01
  • l still haven't quite grasped the Text widget as a whole, l can't form an image of how it works in my mind. Do you have any examples? Example: Check the whole text widget and for every word of "το"(no case), it will be highlighted in red(foreground red) – RookieProgrammer Dec 23 '22 at 10:50
  • @gave a link to an example that has code that lets you do exactly what you want. – Bryan Oakley Dec 23 '22 at 16:46
  • l am afraid, no matter how much l read the examples regarding text tagging system for my specific project, l freeze. l have understood the tag system but to make it dynamic by first looking which words has the listbox B and then checking the whole text widget. It has to do with indices for sure. So a For is needed but what else exactly? A list should return the indices where each word "το" is located within the text widget in addition to whatever word is inside the ListBox B. l don't know how to do a simple example so l can have it in my mind. – RookieProgrammer Dec 28 '22 at 14:58
  • Whenever l press the buttons(add to ListBox B from A and remove accordingly), a method should trigger to update the text widget tags. But the tags should be searching in a For, the patterns of each word inside ListBox B. So each instance of a word, should return the starting indices and end for that word, and then search the next instance of the word. Hmm, now that l think about it, this has the same logic of a "Find", "Replace", correct? Perhaps l should look into that logic how to do it. Each tag for each instance of a word seems extreme, maybe it isn't possible after all for my project. – RookieProgrammer Dec 29 '22 at 16:14
  • It's possible, and not as complicated as you think. I already provided a link to an answer that shows how to search for and tag words in a text widget. All you need to do is call that function once for each word you want to tag. – Bryan Oakley Dec 29 '22 at 16:19

0 Answers0