0

I'm trying to use the pyrouge to calculate the similarity between automated summary and gold standards. When it process both summaries, Rouge works ok. But when it writes the result, it complains that "tuple index out of range" Does anyone know what cause this problem, and how I can fix it?

2017-09-13 23:54:57,524 [MainThread  ] [INFO ]  Set ROUGE home directory to D:\ComputerScience\Research\ROUGE-1.5.5\ROUGE-1.5.5.
2017-09-13 23:54:57,524 [MainThread  ] [INFO ]  Writing summaries.
2017-09-13 23:54:57,524 [MainThread  ] [INFO ]  Processing summaries. Saving system files to C:\Users\zhuan\AppData\Local\Temp\tmppm193twp\system and model files to C:\Users\zhuan\AppData\Local\Temp\tmppm193twp\model.
2017-09-13 23:54:57,524 [MainThread  ] [INFO ]  Processing files in D:\ComputerScience\Research\summary\Grendel\automated.
2017-09-13 23:54:57,524 [MainThread  ] [INFO ]  Processing automated.txt.
2017-09-13 23:54:57,539 [MainThread  ] [INFO ]  Saved processed files to C:\Users\zhuan\AppData\Local\Temp\tmppm193twp\system.
2017-09-13 23:54:57,539 [MainThread  ] [INFO ]  Processing files in D:\ComputerScience\Research\summary\Grendel\manual.
2017-09-13 23:54:57,539 [MainThread  ] [INFO ]  Processing BookRags.txt.
2017-09-13 23:54:57,539 [MainThread  ] [INFO ]  Processing GradeSaver.txt.
2017-09-13 23:54:57,539 [MainThread  ] [INFO ]  Processing GradeSummary.txt.
2017-09-13 23:54:57,557 [MainThread  ] [INFO ]  Processing Wikipedia.txt.
2017-09-13 23:54:57,562 [MainThread  ] [INFO ]  Saved processed files to C:\Users\zhuan\AppData\Local\Temp\tmppm193twp\model.
Traceback (most recent call last):

  File "<ipython-input-8-bc227b272111>", line 1, in <module>
    runfile('D:/ComputerScience/Research/automate_summary.py', wdir='D:/ComputerScience/Research')

  File "C:\Users\zhuan\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 707, in runfile
    execfile(filename, namespace)

  File "C:\Users\zhuan\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 101, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "D:/ComputerScience/Research/automate_summary.py", line 53, in <module>
    output = r.convert_and_evaluate()

  File "C:\Users\zhuan\Anaconda3\lib\site-packages\pyrouge\Rouge155.py", line 361, in convert_and_evaluate
    rouge_output = self.evaluate(system_id, rouge_args)

  File "C:\Users\zhuan\Anaconda3\lib\site-packages\pyrouge\Rouge155.py", line 331, in evaluate
    self.write_config(system_id=system_id)

  File "C:\Users\zhuan\Anaconda3\lib\site-packages\pyrouge\Rouge155.py", line 315, in write_config
    self._config_file, system_id)

  File "C:\Users\zhuan\Anaconda3\lib\site-packages\pyrouge\Rouge155.py", line 264, in write_config_static
    system_filename_pattern = re.compile(system_filename_pattern)

  File "C:\Users\zhuan\Anaconda3\lib\re.py", line 233, in compile
    return _compile(pattern, flags)

  File "C:\Users\zhuan\Anaconda3\lib\re.py", line 301, in _compile
    p = sre_compile.compile(pattern, flags)

  File "C:\Users\zhuan\Anaconda3\lib\sre_compile.py", line 562, in compile
    p = sre_parse.parse(p, flags)

  File "C:\Users\zhuan\Anaconda3\lib\sre_parse.py", line 855, in parse
    p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)

  File "C:\Users\zhuan\Anaconda3\lib\sre_parse.py", line 416, in _parse_sub
    not nested and not items))

  File "C:\Users\zhuan\Anaconda3\lib\sre_parse.py", line 616, in _parse
    source.tell() - here + len(this))

error: nothing to repeat

The gold standards are BookRags.txt, GradeSaver.txt, GradeSummary.txt, Wikipedia.txt The summary that needs to be compared with is automated.txt
Shouldn't either *.txt or [a-z0-9A-Z]+ work? But the previous one gives me "nothing to repeat error", the latter "tuple index out of range" error

r = Rouge155("D:\ComputerScience\Research\ROUGE-1.5.5\ROUGE-1.5.5")
r.system_dir = 'D:\ComputerScience\Research\summary\Grendel\\automated'
r.model_dir = 'D:\ComputerScience\Research\summary\Grendel\manual'
r.system_filename_pattern = '[a-z0-9A-Z]+.txt'
r.model_filename_pattern = '[a-z0-9A-Z]+.txt'
output = r.convert_and_evaluate()
print(output)

I'm manually setting both directory. It seems like the Rouge package can process the txts in it.

Nat
  • 50
  • 7

2 Answers2

2

I had the same issue with pyrouge package. This issue is occurring because the source code is trying to match the filename that we provide, with a certain pattern, on failing which an empty tuple is returned. If you want to know more about this you can take a look at the Rouge155.py file. More specifically, check out the function __get_model_filenames_for_id() for instance.

I resolved it by following the exact filename instructions mentioned in the official page as given below:

r.system_filename_pattern = 'some_name.(\d+).txt'

r.model_filename_pattern = 'some_name.[A-Z].#ID#.txt'

So, my suggestion would be to:

  • Create two separate directories for system_summaries(system generated) and model_summaries(human generated/ Gold Standard)
  • Provide the exact file paths leading to these directories
  • If you are comparing one system_summary (say, SystemSummary.1.txt) to a set of model_summaries (say, ModelSummary.A.1.txt, ModelSummary.B.1.txt, ModelSummary.C.1.txt ), then provide the following pattern:
      r.system_filename_pattern = 'SystemSummary.(\d+).txt'

      r.model_filename_pattern = 'ModelSummary.[A-Z].#ID#.txt' 

You can extend this depending on the number of summaries you want to evaluate.

Hope this helps! Good Luck!

Community
  • 1
  • 1
Pri
  • 21
  • 2
1

The problem is that the rogue library never accounted for the case where no matches are found for your regular expression. The line in the rogue source code id = match.groups(0)[0] is the problematic one. If you look this up in the documentation it says the groups function Return a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern.... Because no matches where found, an empty tuple was returned, and the code is trying to grab the first item from an empty tuple which results in an error.

hostingutilities.com
  • 8,894
  • 3
  • 41
  • 51
  • I see. So I changed my regular expression to *.txt, which will match any summaries in the folder. But it now gives me new error--nothing to repeat. – Nat Sep 14 '17 at 03:56
  • Often a * will be treated as a wildcard character that matches any number of any character, but when using regex the * behaves differently. See https://stackoverflow.com/questions/31386552/nothing-to-repeat-from-python-regex for more info on that. As you mentioned `[a-z0-9A-Z]+` should pick anything up. Could you print out the system_dir variable being used by the write_config_static function and make sure your .txt files are inside of this folder, and not in a sub-directory of this folder. – hostingutilities.com Sep 14 '17 at 06:23
  • It seems like the Rouge can find summaries in both system directory and model directory because from its output, it has processed txts in both directories. The problem still happens in write_config_static function. My system_dir and model_dir are set manually to an absolute address. – Nat Sep 14 '17 at 13:43
  • @Nat, Where you ever able to solve this issue? I am running into the same problem using the same regex you were. I noticed the files are being processed by rouge and put into a tmp folder. However, when write_config_static is called it throws the same exception saying it cannot find files. When manually viewing the tmp folder, the files are there. – Sabolis Mar 15 '18 at 04:04
  • @Sabolis I couldn't solve it. I switched to a Java version Rouge. That one works fine. – Nat Mar 16 '18 at 13:15
  • @Nat, I couldn't either. Switched to the same version. Thanks for your reply! – Sabolis Mar 17 '18 at 15:48