How can I search an xlsx file for a specific string variable? Example:
ORIGINAL TEXT FILE:
one - a b c
two - a b c
three - a b c
four - a b c
five - a b c
XLSX FILE:
two
three
five
OUTPUT CSV FILE:
The following in two columns:
two a
two b
two c
three a
three b
three c
five a
five b
five c
In short, I am going line-by-line in my original text file, selecting a specific string such as 'one', 'two', or 'three' and I am looking to see if that string exists in the xlsx file (I realize that it might be more ideal to search the other way around, but I'm trying to keep things very simple as this is only a section of my code). Ultimately I then identify the a,b,c paramaters and send it all out to a csv file.
-- I already imported os, csv, re, openpyxl, load_workbook (from openpyxl), sys, and set default encoding to utf-8
with open("OriginalTextFile.txt", 'r') as SearchList:
for line in SearchList:
line_text = str(line)
try:
test_query = re.search('start (.+?) end', line_text).group(1)
print test_query # confirms that I AM getting the correct test_query
if str(test_query) in load_workbook('Subset.xlsx', read_only=True):
print 'FOUND IT IN SUBSET!'
----- Continues -----
My question lies within line 7 specifically; how can I identify the existence of a specific string (eg. "one") in my subset.xlsx file? Is this line correct and I'm just missing something simple? Any additional suggestions, links, tutorials, documentations are welcome!
Thank you very much, in advance!