-2

I have multiple text files in a folder say "configs", I want to search a particular text "-cfg" in each file and copy the data after -cfg from opening to closing of inverted commas ("data"). This result should be updated in another text file "result.txt" with filename, test name and the config for each file.

NOTE: Each file can have multiple "cfg" in separate line along with test name related to that configuration.

E.g: cube_demo -cfg "RGB 888; MODE 3"

My approach is to open each text file one at a time and find the pattern, then store the required result into a buffer. Later, copy the entire result into a new file.

I came across Python and looks like it's easy to do it in Python. Still learning python and trying to figure out how to do it. Please help. Thanks.

I know how to open the file and iterate over each line to search for a particular string:

import re
search_term = "Cfg\s(\".*\")"           // Not sure, if it's correct
ifile = open("testlist.csv", "r")
ofile = open("result.txt", "w")
searchlines = ifile.readlines()
for line in searchlines:
    if search_term in line:
        if re.search(search_term, line):
            ofile.write(\1)              
// trying to get string with the \number special sequence
ifile.close()
ofile.close()

But this gives me the complete line, I could not find how to use regular expression to get only the "data" and how to iterate over files in the folder to search the text.

Praful S
  • 43
  • 1
  • 8
  • This sounds like grep would be the better tool. – Holloway Oct 13 '15 at 09:16
  • @Trengot: thanks for the suggestion. I need to do it in Windows. As far as I know, grep is only available for LINUX. – Praful S Oct 13 '15 at 09:22
  • Where is your code ? What have you tried that didn't work ? – bruno desthuilliers Oct 13 '15 at 09:41
  • Have a look at the `open` function and how it iterates over (lines in) files. For simple operations, the `str` methods will do (e.g. `str.split`), but you might need regular expression (available in the `re` module). – MisterMiyagi Oct 13 '15 at 10:26
  • @brunodesthuilliers: Sorry for the inconvenience. Code is updated above. I do not have any experience with python. Learning and trying. – Praful S Oct 13 '15 at 11:36

1 Answers1

0

Not quite there yet...

import re
search_term = "Cfg\s(\".*\")"           // Not sure, if it's correct

"//" is not a valid comment marker, you want "#"

wrt/ your regexp, you want (from your specs) : 'cfg', followed by one or more space, followed by any text between double quotes, stopping at the first closing double quote, and want to capture the part between these double quotes. This is spelled as 'cfg "(.?)"'. Since you don't want to deal with escape chars, the best way is to use a raw single quoted string:

exp = r'cfg *"(.+?)"'

now since you're going to reuse this expression in a loop, you might as well compile it already:

exp = re.compile(r'cfg *"(.+?)"')

so now exp is a re.pattern object instead of string. To use it, you call it's search(<text>) method, with your current line as argument. If the line matches the expression, you'll get a re.match object, else you'll get None:

>>> match = exp.search('foo bar "baaz" boo')
>>> match is None
True
>>> match = exp.search('foo bar -cfg "RGB 888; MODE 3" tagada "tsoin"')
>>> match is None
False
>>> 

To get the part between the double quotes, you call match.group(1) (second captured group, the first one being the one matchin the whole expression)

>>> match.group(0)
'cfg "RGB 888; MODE 3"'
>>> match.group(1)
'RGB 888; MODE 3'
>>> 

Now you just have to learn and make correct use of files... First hint: files are context managers that know how to close themselves. Second hint: files are iterable, no need to read the whole file in memory. Third hint : file.write("text") WONT append a newline after "text".

If we glue all this together, your code should look something like:

import re
search_term = re.compile(r'cfg *"(.+?)"')

with open("testlist.csv", "r") as ifile:
    with open("result.txt", "w") as ofile:
        for line in ifile:
            match = search_term.search(line)
            if match:
                ofile.write(match.group(1) + "\n")      
bruno desthuilliers
  • 75,974
  • 6
  • 88
  • 118
  • Minor update to avoid syntax error ("search"): match = search_term.search(line) Thank you, Code works. Nice explanation, I could understand my Errors. Also, I need to figure how to implement it for a root directory containing sub-folders and files. – Praful S Oct 13 '15 at 12:52
  • @PrafulS typo corrected, thanks. wrt/ traversing a directory hierarchy, `os.walk()` is your friend (https://docs.python.org/2/library/os.html#os.walk) – bruno desthuilliers Oct 13 '15 at 13:38
  • Thanks for the suggestion, I got a great way to traverse directory using os.walk(). – Praful S Oct 14 '15 at 07:07
  • Now, the next step is to remove the duplicate configuration, as same test are used for many different test suites, The final aim is to get the frequency of each config used(removing duplicate entries for the same test), also need to separate the list based on file-names, belonging to a particular group of test. – Praful S Oct 14 '15 at 07:16
  • In order to consider these configs as valid config, I need to compare it with a pre-existing list. I am getting some logical Error, as code is working fine for some configs and giving false output for other configs. Kindly help. Raised it as separate query: – Praful S Oct 14 '15 at 14:28
  • I came across a condition, where the condition(r'cfg *"(.+?)"') does not work. E.g: cube_demo -cfg "" -path "C:\work". In this case, it gives me a wrong output as (" -path "). I could not get proper condition to handle these kind of situations. Kindly help. I could resolve the error with hacky way, but just wanted to check, if any proper pattern can be used. – Praful S Oct 15 '15 at 11:19
  • regexp's syntax is fully documented here https://docs.python.org/2/library/re.html and here https://docs.python.org/2/library/re.html - maybe you could start with Reading The FineManual ? – bruno desthuilliers Oct 15 '15 at 11:40
  • Yes, I read the regex's manual and tried multiple options. But, this is slightly tricky for me. Anyways, I will try out further. Thank you. One solution that I could think is check for the (") as the 1st character in the match.group(1) string and continue iteration to next test(line) as the configuration is empty for the particular test.. – Praful S Oct 15 '15 at 12:14
  • After going through the documentation, I tried these options for regular expression pattern: 1. -cfg\s+\".*\" 2. -cfg\s+\".*\"[,\s]* 3. -cfg\s+\".*\"[,\s]+ . These are the cases to be verified: 1. -cfg "", -path "C:\temp" (including comma) 2. -cfg "" -path "C:\temp" (No comma) 3. -cfg "MEM 20; w 10", -path "C:\temp" (config with data and comma) 4. -cfg "MEM 20; w 10" -path "C:\temp" (config with data, without comma). I could not get a proper REGULAR EXPRESSION. Any help would be much appreciated. – Praful S Oct 15 '15 at 13:51
  • @PrafulS please post a new question - comments are not appropriate for this, and the question we are commenting on is solved AFAICT (while we're at it you may want to accept my answer since it solved your initial problem). – bruno desthuilliers Oct 15 '15 at 13:58
  • Sorry, but this is still the part of the same question, as data can be represented in multiple ways. The example was just for basic understanding of question to the readers. These cases should still hold good for the asked question. I can update the question, Wouldn't that be better.. as its just an add on! – Praful S Oct 15 '15 at 15:02