0

I am trying to to read and write to the same file. currently the data in 2289newsML.txt exists as normal sentences but I want to append the file so it stores only tokenized versions of the same sentences.

I used the code below but even tho it prints out tokenized sentences it doesnt write them to the file.

from pathlib import Path
from nltk.tokenize import word_tokenize

news_folder = Path("file\\path\\")
news_file = (news_folder / "2289newsML.txt")

f = open(news_file, 'r+')
data = f.readlines()

for line in data:
    words = word_tokenize(line)
    print(words)
    f.writelines(words)

f.close

any help will be appreciated.

Thanks :)

mbatchkarov
  • 15,487
  • 9
  • 60
  • 79
The BrownBatman
  • 2,593
  • 1
  • 13
  • 29
  • Are you sure you want to write the tokenized words to the same file? – Anon May 06 '18 at 08:54
  • yes i have saved duplicate versions for this reason- in all fairness i wouldn't mind saving tokenized words to a new file in a different directory either – The BrownBatman May 06 '18 at 08:57

2 Answers2

1
from nltk.tokenize import word_tokenize
with open("input.txt") as f1, open("output.txt", "w") as f2:
    f2.writelines(("\n".join(word_tokenize(line)) for line in f1.readlines()))

Using with comprehension ensures file handle will be taken care of. So you do not need f1.close()

This program is writing to a different file.

Of course, you can do it this way too:

f = open(news_file)
 data = f.readlines()

file = open("output.txt", "w")


for line in data:
    words = word_tokenize(line)
    print(words)
    file.write('\n'.join(words))

f.close
file.close

Output.txt will have the tokenized words.

Anon
  • 2,608
  • 6
  • 26
  • 38
  • i get a error: file.write(words) TypeError: write() argument must be str, not list.. i changed file.writelines too - it runs but nothing gets written to the file. – The BrownBatman May 06 '18 at 09:16
  • @TheBrownBatman, that is because the output from tokenize function is list. I have made the fix by converting list to string by using "join" method. Try now. – Anon May 06 '18 at 09:19
  • the output file is still showing up as blank txt file – The BrownBatman May 06 '18 at 09:26
  • And when execution reaches -> print(words), you see the words printed, is that right? – Anon May 06 '18 at 09:29
  • yes `print(words)` has been working perfectly so far – The BrownBatman May 06 '18 at 09:30
  • Can you posted first few lines of your text file ? (2289newsML.txt) – Anon May 06 '18 at 09:32
  • https://drive.google.com/file/d/1RPVStk1cXHSDyyEwJD12D8906nMbbdzq/view?usp=sharing the first line was too long by 426 characters so i am sharing the file with you. If you are uncomfortable clicking the link - i can pastethe first 2 lines in multiple comments – The BrownBatman May 06 '18 at 09:36
  • I used the same file and ran program locally. I can see it is working for me. Can you make sure you have write access to the place where you are creating output.txt? If you are on Linux/MacOs, try creating the file in /tmp – Anon May 06 '18 at 09:42
  • I am admin on Windows 10 so i should have access - `output_dir = Path ("output\\path\\") output_file = (output_dir / "2289newsML.txt")` is something i added to replace `output.txt, "w"` but this isnt a issue as it creates the file in the folder but does not write anything to it. I also tried to create the file in the same dir as the original but that has the same result. – The BrownBatman May 06 '18 at 09:47
  • weird thing - the first code of 2 lines you posted creates the file AND write it BUT without no tokenization in the new file. – The BrownBatman May 06 '18 at 09:54
  • I tried the second method assuming you will prefer that. Let me modify first method then. – Anon May 06 '18 at 09:54
  • Also, for the second method creating empty file, can you make sure file.close() is called? (print statements should be enough.) I assume the file is not closed properly. – Anon May 06 '18 at 09:55
  • @TheBrownBatman , I have modified the first method and tested it. Now every word is printed on new line. Can you try now? – Anon May 06 '18 at 10:05
  • The above `print` was being called. and using the modified first code words do get printed on a new line of a new file. thanks for your help, Ill figure out the rest from here :) – The BrownBatman May 06 '18 at 10:10
  • Great! What I meant was to add these print lines. print("before calling file close") ; file.close(); print("after file close"). I suspect in the second method the file was not closed (That is the only thing I can think of). In any case always prefer the first way to read/write files as it handles close of files automatically for you. – Anon May 06 '18 at 10:13
  • yea i added something similar to those print statements ill see why no2 wasnt working for me - seems weird it shouldnt – The BrownBatman May 06 '18 at 10:14
0

I am trying to to read and write to the same file. currently the data in 2289newsML.txt exists as normal sentences but I want to append the file...

Because you are opening file in r+ mode.

'r+' Open for reading and writing. The stream is positioned at the beginning of the file.

If you want to append new text at the end of file consider opening file in a+ mode.

Read more about open

Read more about file modes

Nishant Nawarkhede
  • 8,234
  • 12
  • 59
  • 81