I have downloaded ~100 stored procs as .txt files from SQL Server. From these txt files I am looking to record every iteration of a keyword beginning with "XXX". So every time the word occurs in the script, it is placed into a dataframe with the name of the file next to it.
For example:
File: fileone
Script: "AAA BBB CCC XXXA XXXB DDD"
Would return:
Keyword | File |
---|---|
XXXA | fileone |
XXXB | fileone |
I have a dataframe of my keywords and would like to loop this across all of my files.
Ideally, resulting in an output that looks like:
Keyword | File | File | File |
---|---|---|---|
XXXA | fileone | filetwo | filethree |
XXXB | fileone | filetwo | null |
XXXC | null | null | filethree |
Below is the code that I am using to return the keyword list: I am doing this by taking the combined script of all of my stored procs (copy and pasted into one txt file) and finding all of the keywords that being with "XXX".
with open(allprocs, 'r') as f:
for line in f:
for word in line.split():
if word.startswith('XXX.'):
list.append(word)
new_List = pd.unique(list).tolist()
df1 = pd.DataFrame(new_List,
columns = ['Tables'])
df1 = df1.drop_duplicates()