0

I am having trouble loading a file in jupyter notebook.

Here is my project tree:

-- home

---- cdsw

------ my_main.py

------ notebooks

-------- my_notebook.ipynb

------ dns

-------- assets

---------- stopwords.txt

-------- bilans

---------- my_module.py

Know that '/home/cdsw/" is in my PYTHONPATH - the same interpreter in which I launch jupyter -.

In my_module.py I have these lines:

PATH_STOPWORDS: Final = os.path.join("dns", "assets", "stopwords.txt")
STOPWORDS: Final = load_stopwords(PATH_STOPWORDS)

load_stopwords is basically just a open(PATH_STOPWORDS, 'r'). So my problem is that when I import dns.bilans.my_module inside my_main.py it works fine: file is correctly loaded. Yet, when I import it from my_notebook.ipynb, it does not :

FileNotFoundError: [Errno 2] No such file or directory: 'dns/assets/stopwords.txt'

So my_module is indeed founded by jupyter kernel (because it reads the code lines of the file) but can't use the relative path provided like it does from a run in a terminal.

When I use a open(relpath, 'r') inside a module, I don't need to go all through the project tree right ? Indeed it DOES work in my_main.py ...

I really don't get it ...

The output of os.getcwd() in jupyter is "/home/cdsw/notebooks".

JADS
  • 31
  • 5
  • You've shown no assets folder – OneCricketeer Feb 13 '22 at 16:50
  • Yep sorry I edit that, the assets folder indeed exists and the stopwords.txt also ... – JADS Feb 13 '22 at 16:52
  • 1
    The path and open function are relative to where they are ran, not relative to where they are defined. In other words, create a `notebooks/dns/assets` folder, and it'll work. Or move the ipynb file up a folder, and it might work. – OneCricketeer Feb 13 '22 at 16:55
  • The correct path in your case is `../dns/assets/stopwords.txt` but probably a better solution is to specify the full path when starting your script. Perhaps see also [What exactly is current working directory?](https://stackoverflow.com/questions/45591428/what-exactly-is-current-working-directory) – tripleee Feb 13 '22 at 17:18
  • IMO, the pretty much universal "best answer" in these cases is to start with the path to the directory containing your script, computed via `here = os.path.dirname(__file__)`, and then traverse from there with `os.path.join`. In your case, `os.path.join(here, '..', '..', 'assets', 'stopwords.txt')`. The other two alternatives of relying on the cwd or having to somehow specify an absolute path are both yucky. This is the way to go. Your code knows where it's coming from. Use that fact to let it find nearby files. – CryptoFool Feb 13 '22 at 17:29
  • @tripleee - I don't see how this is a duplicate of the question you cite, nor do I see how the answer to that question addresses the OP's problem. I would really have liked to have supplied my answer as an answer vs a comment. What the OP is struggling with is a very common problem, and neither `./` or `~/` are part of the best solution. Jumping the gun on closing questions seems counterproductive. – CryptoFool Feb 13 '22 at 17:35
  • I'll be happy to point to a better duplicate if you can find one. Basic understanding of relative file names is a very common beginner problem and not really suitable for Stack Overflow. – tripleee Feb 13 '22 at 17:41
  • @tripleee - respectfully, I don't see that the OP is asking for an understanding of relative file paths. Rather, they are asking how to access files associated with and accompanying a particular piece of code. The very powerful technique that I explain and use all the time is not inherent to the Python language. Having a clear explanation of the technique in SO is, it seems to me, perfectly appropriate and suitable for SO. Of course, if there's an existing question that describes this technique, then closing this as a dup of that question would be appropriate. – CryptoFool Feb 13 '22 at 17:46
  • Almost certainly a duplicate regardless, but have a go at it. We can close again if we find a duplicate of your answer. – tripleee Feb 13 '22 at 17:47
  • Nice. Thanks for considering my position. I'll look for a duplicate first, and then will write up my answer. – CryptoFool Feb 13 '22 at 17:49

1 Answers1

0

This existing SO question suggests how to find files relative to the position of a Python code file. It isn't exactly the same question, however, and I believe that this technique is so important for every Python programmer to understand, that I'm going to provide a more thorough answer.

Given a piece of Python code, one can compute the path of the directory of the source file containing that code via:

here = os.path.dirname(__file__)

Having the position of the relevant source file, it is easy to compute an absolute path to any data file that has a well known location relative to that source file. In this case, the way to do that is:

stopwords_path = os.path.join(here, '..', '..', 'assets', 'stopwords.txt')

This path can be supplied to open() or used in any other way to refer to the stopwords.txt data file. Here, the way to use this path would be:

load_stopwords(stopwords_path)

I use this technique to not only find files that accompany code in a particular module, but also to find files that are in other locations throughout my source tree. As long as the code and data file exist in the same source repository, or are shipped together in a single Python package, the relative path will not change from installation to installation, and so this technique will work.

In general, you should avoid the use of relative paths. Whenever possible, you should also avoid having to tell your code where to find something. For any situation, ask yourself how you can obtain a reliable absolute path that you can then use to then locate whatever it is you're wanting to access.

CryptoFool
  • 21,719
  • 5
  • 26
  • 44