0

My task is an NLP task and I have to analyse a corpus of sentences. Each word of the sentence is a line and every word on that line is analysed.

Sentences are separated with a blank line. I would like to give an ID to each sentence so as to be able to recover other information that is in other fields in another table. The desired result would be:

1 the
1 cat
1 is
1 black

2 the
2 moon
2 is
2 full

and so on, where every word is a new line. I think I should do it in Python, but I'm very confused.

TylerH
  • 20,799
  • 66
  • 75
  • 101
  • 6
    Hi Gloria -- unfortunately, StackOverflow is not a place to have code written for you or get general advice. The site rules require you to attempt the coding yourself, research problems you have and if help is still needed provide your code and provide information on how you've attempted to solve your problem. I suspect this question will be closed. The good news is, Python is easy to start and this problem is not particularly complex. [Try this tutorial](https://www.codecademy.com/learn/python) or search google for Python tutorials. there are plenty of free ones that will get you started – Lost Nov 30 '16 at 13:54
  • could you provide an input file – Navidad20 Nov 30 '16 at 13:54
  • In these cases I like to refer to the "we are not a code writing service" section. With that said I would do something along the lines of: split the sentence into a list (at each space) then use the new line indicator to set your IDs, either saved in a matrix with the sentence or in a separate list depending on how you want to do it. – Steve Byrne Nov 30 '16 at 13:58
  • For StackOverflow reading for your problem, try these questions: [reading lines in a file into Python](http://stackoverflow.com/questions/3277503/how-to-read-a-file-line-by-line-into-a-list-with-python), [Splitting Sentence into list of words](http://stackoverflow.com/questions/743806/split-string-into-a-list-in-python) – Lost Nov 30 '16 at 14:01
  • @Lost I'm sorry I wasn't looking for already-done-code, but only for ideas about how to deal with the problem. Thank you and sorry again – Gloria Malorgio Nov 30 '16 at 14:03
  • @GloriaMalorgio nothing to be sorry about, we're all here to learn. Good luck – Lost Nov 30 '16 at 14:06
  • 1
    @Navidad20 Typically we ask that you don't change spelling from UK to US (or vice versa) when editing a post. – TylerH Nov 30 '16 at 21:58

1 Answers1

0

Something like this should do the trick:

count = 1
input_file = open('input.txt', 'r')
output_file = open('results.txt', 'w')
for line in input_file:
    new_line = str(count) + ' '  + line.lstrip().replace(' ', ' ' + str(count) + ' ')
    count = count + 1
    print new_line
    output_file.write(new_line)

input_file.close()
output_file.close()