-1

The whole error:

C:\Users\Desktop\texts>python similarity1.py
Traceback (most recent call last):
  File "similarity1.py", line 19, in <module>
    documents = [open(f, encoding="utf-8").read() for f in text_files]
  File "similarity1.py", line 19, in <listcomp>
    documents = [open(f, encoding="utf-8").read() for f in text_files]
FileNotFoundError: [Errno 2] No such file or directory: 'apempe_chunks.txt'

and the code producing the specific error:

import os
import codecs
import string, re
from pathlib import Path


path = "C:\\Users\\Desktop\\texts\\dataset"
text_files = os.listdir(path)

documents = [open(f, encoding="utf-8").read() for f in text_files]
sparse_matrix = tfidf_vectorizer.fit_transform(documents)

Strange thing is that the program finds apempe_chunks.txt which is inside the file dataset.

I've researched the question in SO, but I can't fix it.

lynx
  • 180
  • 16
  • 2
    `os.listdir()` only returns the file *names*. You will need to put the path on the front of each file name yourself to be able to open them. – quamrana Jul 21 '20 at 11:21
  • 1
    btw this [answer](https://stackoverflow.com/a/3964691) has an example of using `os.path.join()`. – quamrana Jul 21 '20 at 11:24
  • @quamrana From the asnwer you provided I gathered and moved the script inside `dataset` and added `if f.endswith('.txt')]` to my code. Seems to work just fine. – lynx Jul 21 '20 at 11:42
  • 1
    If you found an answer yourself, consider giving it as a your own answer to your question (and accept it after a 2-days wait-period) – if you think your Q/A is helpful to others – Ivo Mori Jul 21 '20 at 11:52
  • @IvoMori It's a work around actually, not an answer that provides further knowledge of Python. Maybe I will do so anyhow – lynx Jul 21 '20 at 12:02
  • 2
    You're the expert on your own question. You mentioned that you've searched SO for answers but none of them were helpful (would have also been good if you'd have included links to them in your question for reference); so you've got here now a possibility to contribute "what was missing" so that others have a good Q/A for the future when running into the same problem. – Of course, an answer only makes sense when it's complete (have a look at [How to write a good answer](https://stackoverflow.com/help/how-to-answer)). – Ivo Mori Jul 21 '20 at 12:07

1 Answers1

0

To work around the error, I moved similarity1.py within the dataset folder, I added this to my code if f.endswith('.txt')] and now it works fine.

So now the complete code is

documents = [open(f, encoding="utf-8").read() for f in text_files if f.endswith('.txt')]

ensuring I only work with every .txt inside the dataset directory, not counting the python script it self or other files.

The idea came from this thread of answers, to a question similar to mine.

lynx
  • 180
  • 16