Annotating a Corpus (Syntaxnet)

Question

I downloaded and installed SyntaxNet following Syntax official documentation on Github. following the documentation (annotating corpus) I tried to read a .conll file named wj.conll by SyntaxNet and write the results in wj-tagged.conll but I could not. My questions are:

does SyntaxNet always reads .conll files? (not .txt files?). I got a bit confused as I knew SyntaxNet reads .conll file for training and testing process but I am a bit suspicious that it is necessary to convert a .txt file to .conll file in order to have their Part Of Speach and Dependancy Parsing.
How can I make SyntaxNet reads from files (I tired all possible ways explain in GitHub documentation about SyntaxNet and It didn't work for me)

What is your question? Edit your post to ask a clear question so people can answer it ! — Olivier Moindrot, May 27 '16 at 08:26
Just a few comments re your question 1, *"does SyntaxNet always reads .conll files? (not .txt files?)"*, it should not make a difference whether you call a file *XYZ.conll* or *XZY.txt.* The formatting inside is what matters, not the the file extension. Re your question 2 *"How can I make SyntaxNet reads from files"*, have you tried the standard shell input or Python file reading? According to the docs [here](https://github.com/tensorflow/tensorflow) that looks like a possibility to me. — patrick, May 27 '16 at 20:41

score 6 · Accepted Answer · edited Jun 21 '16 at 23:50

6

Add these declaration lines to "context.pbtxt" at the end of the file. Here "inp" and "out" are the text files present in the root directory of syntexnet.

   input {
   name: 'inp_file'
   record_format: 'english-text'
     Part {
     file_pattern: 'inp'
     }
   }
   input {
   name: 'out_file'
   record_format: 'english-text'
     Part {
     file_pattern: 'out'
     }
   }

Add sentences to the "inp" file for which you want tagging to be done and specify them in shell the next time you run syntaxnet using --input and --output tags.

Just to help you a bit more I am pasting an example shell command.

bazel-bin/syntaxnet/parser_eval \
--input inp_file \
--output stdout-conll \
--model syntaxnet/models/parsey_mcparseface/tagger-params \
--task_context syntaxnet/models/parsey_mcparseface/context.pbtxt \
--hidden_layer_sizes 64 \
--arg_prefix brain_tagger \
--graph_builder structured \
--slim_model \
--batch_size 1024 | bazel-bin/syntaxnet/parser_eval \
--input stdout-conll  \
--output out_file \
--hidden_layer_sizes 512,512 \
--arg_prefix brain_parser \
--graph_builder structured \
--task_context syntaxnet/models/parsey_mcparseface/context.pbtxt \
--model_path syntaxnet/models/parsey_mcparseface/parser-params \
--slim_model --batch_size 1024

In the above script the output(POS tagging) of the first shell command is used as an input for the second shell command, where the two shell commands are seperated by "|"

edited Jun 21 '16 at 23:50

Community

1
1

answered May 28 '16 at 06:58

Harsh Patni

384
1
12

Sure, sorry I am new to Stack overflow I will do it right now – Nazanin Tajik Jun 03 '16 at 12:40
@HrashPatni the output file just identifies part of speach of each work and it does not show the dependancy, such as ROOT, nsubj and so forth. I don't know whether I did something wrong or it is the script's problem – Nazanin Tajik Jun 10 '16 at 19:48
1

@Nazanin the older script was meant to do just POS tagging, it was just an example as I mentioned before. I have updated the script as per your need though just to help you. I hope it helps . :) – Harsh Patni Jun 10 '16 at 20:48
@HarshPatni I am getting this error: F ./syntaxnet/proto_io.h:147] Check failed: input.record_format_size() == 1 (0 vs. 1)TextReader only supports inputs with one record format: name: "inp" – Anish Jun 11 '16 at 14:02
@Nazanin are you still able to run the old script I posted without any error? – Harsh Patni Jun 12 '16 at 20:58
@harshpatni yes I just ran the scripts and got the same error – Nazanin Tajik Jun 12 '16 at 23:36
1

@Nazanin By old I meant the one which was giving just POS. I hope you got that. If you still get the error then they are not in the script but something else as the old script was running before. Please try and remember what you changed in this while. Else download the syntaxnet again and follow the same instructions like before. – Harsh Patni Jun 13 '16 at 05:19

score -1 · Answer 2 · answered Jun 08 '16 at 20:42

-1

just a quick help if you want to save the output of demo in a .txt file:

try echo "open file X with application Y" | ./demo.sh > output.txt it gives you sentence tree to the current directory.

answered Jun 08 '16 at 20:42

Nazanin Tajik

412
2
15

This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post. - [From Review](/review/low-quality-posts/12653432) – Eugene S Jun 14 '16 at 03:52
1

@EugeneS yes it looks that it does not provide the answer. what I wanted to do was to make SyntaxNet read from a file, assume the output of speech recognition and save its output as a text file in order to send it to python for further analyses. unfortunately the answers I got here was not working at all. so I ignored the part that I can read from a file and tried to find a way to save the tree output in a text file. if still you think it is not relevant please let me know and I will delete it – Nazanin Tajik Jun 14 '16 at 12:37

Annotating a Corpus (Syntaxnet)

2 Answers2

Linked