0

I am trying to train Glove model using python on my text corpus following the implementation specified on this page. Glove model I am encountering problems while reading corpus file from the specified path

parser.add_argument('corpus', metavar='corpus_path',
                        type=partial(codecs.open, encoding='utf-8'))

How to specify file path for this argument. I have used command line argument as shown below

C:\Users\JAYASHREE\Documents\NLP>python Glove_python_bbc.py 'C:/Users/JAYASHREE/Documents/NLP/text-corpus' --vocab-path C:/Users/JAYASHREE/Documents/NLP/vocabulary --cooccur-path C:/Users/JAYASHREE/Documents/NLP/cooccur_matrix -w 10 --min-count 10 --vector-path C:/Users/JAYASHREE/Documents/NLP/word-vector -s 40 --iterations 10 --learning-rate 0.1 --save-often True

I am getting error as follows

Traceback (most recent call last):
  File "Glove_python_bbc.py", line 380, in <module>
    main(parse_args())
  File "Glove_python_bbc.py", line 70, in parse_args
    return parser.parse_args()
  File "C:\Users\JAYASHREE\Anaconda2\lib\argparse.py", line 1701, in parse_args
    args, argv = self.parse_known_args(args, namespace)
  File "C:\Users\JAYASHREE\Anaconda2\lib\argparse.py", line 1733, in parse_known_args
    namespace, args = self._parse_known_args(args, namespace)
  File "C:\Users\JAYASHREE\Anaconda2\lib\argparse.py", line 1921, in _parse_known_args
    positionals_end_index = consume_positionals(start_index)
  File "C:\Users\JAYASHREE\Anaconda2\lib\argparse.py", line 1898, in consume_positionals
    take_action(action, args)
  File "C:\Users\JAYASHREE\Anaconda2\lib\argparse.py", line 1791, in take_action
    argument_values = self._get_values(action, argument_strings)
  File "C:\Users\JAYASHREE\Anaconda2\lib\argparse.py", line 2231, in _get_values
    value = self._get_value(action, arg_string)
  File "C:\Users\JAYASHREE\Anaconda2\lib\argparse.py", line 2260, in _get_value
    result = type_func(arg_string)
  File "C:\Users\JAYASHREE\Anaconda2\lib\codecs.py", line 896, in open
    file = __builtin__.open(filename, mode, buffering)
IOError: [Errno 22] invalid mode ('rb') or filename: "'C:/Users/JAYASHREE/Documents/NLP/text-corpus'"

How to pass argument to corpus path

Thanks

Jayashree
  • 811
  • 3
  • 13
  • 28

1 Answers1

1

In my impression, single quotes in Windows causes problems in command line arguments, they are not escaped but interpreted as a part of the string. See the last line in the error log:

IOError: [Errno 22] invalid mode ('rb') or filename: "'C:/Users/JAYASHREE/Documents/NLP/text-corpus'"

the filename has single quotes in it.

Simply replace single quotes with double quotes, or in your case, omit the quots, and you will be fine.

Unix-like operating systems doesn't seem like to have these problems.

See this and this question you might will get a hint.

Chazeon
  • 546
  • 2
  • 14
  • When I am using double quotes or not using any quotes I am getting following error ' [-h] [--vocab-path VOCAB_PATH] [--cooccur-path COOCCUR_PATH] [-w WINDOW_SIZE] [--min-count MIN_COUNT] [--vector-path VECTOR_PATH] [-s VECTOR_SIZE] [--iterations ITERATIONS] [--learning-rate LEARNING_RATE] [--save-often] corpus_path Glove_python_bbc.py: error: unrecognized arguments: True' – Jayashree Aug 06 '17 at 06:29
  • 1
    @Jayashree that's another thing. The `--save-often` do not need a specific `True` there. When the `--save-often` option is presented means you have set that option to true (see the `action='store_true'`) simply remove the True then it's done. – Chazeon Aug 06 '17 at 08:42