2

I'm new to Deep Learning and PyTorch, so please do bear with me if some questions seem silly or I'm not asking in the correct format. I was watching this video as part of a PyTorch series on Deep Learning: https://www.youtube.com/watch?v=8n-TGaBZnk4 . This video specifically is about ETL (using Fashion-MNIST dataset). I have a few questions on the video at 7:05.

Question 1: In the Fashion-MNIST subclass constructor we passed it the argument: ‘root’, where the instructor mentioned: this is the location in disk where data is located. Sorry maybe this is a silly question, but is this where the data is located on the source server (from the URL) disk, or is this the path location where you want to save the data on your computer locally?

Question 2: Also for the Fashion-MNIST is the 'root' always the same location path: i.e. './data/FashionMNIST'?

Question 3: If the 'root' defines the location path where the data is located on the source server, then where would it be downloaded on locally? I checked my 'download' folder (I'm using Windows 7 laptop), and couldn't find the files there?

Question 4: The video mentioned that we should check if the data, in subsequent calls, are downloaded already or not (i.e. in the argument we pass download=true).

4(a): What's a good approach to do this? Do we put an if statement in place to check for this? Or is there a smarter way of checking for downloaded data?

4(b): Also what does it mean by "subsequent calls"? Does it mean when we need to call the 'FashionMNIST' constructor again for the test_data download?

Question 5: Finally, I tried running the code below (which is the one in the video) on Spyder IDE (Python 3.5):

import torch
import torchvision
import torchvision.transforms as transforms

train_set = torchvision.datasets.FashionMNIST(
      root='./data/FashionMNIST'
        ,train=True
        ,download=True
        ,transform=transforms.Compose([
            transforms.ToTensor()
        ])                                            
  )

I got the output:

Traceback (most recent call last):

  File "<ipython-input-3-3ac000b9e90a>", line 10, in <module>
    transforms.ToTensor()

  File "C:\Program Files\Anaconda3\lib\site-packages\torchvision\datasets\mnist.py", line 68, in __init__
    self.download()

  File "C:\Program Files\Anaconda3\lib\site-packages\torchvision\datasets\mnist.py", line 136, in download
    makedir_exist_ok(self.raw_folder)

  File "C:\Program Files\Anaconda3\lib\site-packages\torchvision\datasets\utils.py", line 41, in makedir_exist_ok
    os.makedirs(dirpath)

  File "C:\Program Files\Anaconda3\lib\os.py", line 241, in makedirs
    mkdir(name, mode)

FileNotFoundError: [WinError 206] The filename or extension is too long: './data/FashionMNIST\\FashionMNIST\\raw'

Not sure why I got that error at the end. In addition I ran the code on Jupyter Notebook, as per the video, and it worked fine. But I'm wondering why it throws that error in Spyder IDE.

Many thanks in advance.

Hazzaldo
  • 515
  • 1
  • 8
  • 24

1 Answers1

2

No genuine question is a silly question, Answering questions one bye one:

Ans 1 & 2 :

root is the path on your local disk where the data will be saved, you can give ny path according to your liking it will not cause an issue.

Ans 3: The urls etc are defined within the files and the path of the data is all you need to do: in order to look at the urls from where the data is downloaded here is a link.

Ans 4. : download = True merely gives it permission to download if the data doesn't exists the downloader will automatically check if the data already exists, if it exists it will still not download, even if download is set to be true, again it happens in the background you don't have to worry about it.

Ans5 : The issue isn't a torch issue exactly it has more to do with how it is being compiled on in windows, the issue is discussed at length here & here

Inder
  • 3,711
  • 9
  • 27
  • 42
  • many thanks for the answers. So for answer 1, if root is where data is saved in your local machine, then where would the path `'./data/FashionMNIST'` be saved in a Windows 7 (64-bit)? I.e. what's the directory path (e,g, within Desktop, My Documents ...etc)? For Answer 3, I think there's a slight misunderstanding. So my question was, where would the data be saved, not where do we get the data from. So this is pretty much the same question now as the one I'm asking following your Answer 1. – Hazzaldo Mar 11 '19 at 02:07
  • For Answer 5, that makes sense. So it's because the path where the file is saved (including the file name) is too long to save in Windows file-system rules. But why did this work in Jupyter Notebook? Doesn't Jupyter save a `.py` file locally as well on Windows OS? Or is it maybe because Jupyter actually compiles the program on a VM or Browser, rather than directly on Windows OS? – Hazzaldo Mar 11 '19 at 02:08
  • @Hazzaldo In Linux `./data/` means you are creating a directory where the file is executed, so for example if u have the.ipynb file in Documents it will create a directory data in Documents itself. In Windows also u will be able to see the path in topmost part something as `localhost:8888/Documents /` – Inder Mar 11 '19 at 02:55
  • For more information on how jupyter works kindly vo consider this link :https://jupyter.readthedocs.io/en/latest/architecture/how_jupyter_ipython_work.html – Inder Mar 11 '19 at 02:59
  • Thanks @Inder. All makes sense. One last question before I conclude this question thread. So how to solve Q5 (filename or extension too long). Because I tried changing the root to just './data/FMNIST' and moved my .py file to Desktop, so as to shorten the filename, path extension (whatever I can shorten). I still got the error: "FileNotFoundError: [WinError 206] The filename or extension is too long: './data/FMNIST\\FashionMNIST\\processed'". – Hazzaldo Mar 11 '19 at 04:14
  • @Hazzaldo the path convention in windows is different as far as i can recal windows use forward slash instead of back slash etc, kindly go to your Desktop and right-click on any file and click on properties it will have the exact path to Desktop kindly consider adding that in the root feild. – Inder Mar 11 '19 at 07:35
  • thanks for the suggestion. I tried '.\data\FashionMNIST', but I still got the error: FileNotFoundError: [WinError 206] The filename or extension is too long: '.\\data\\FashionMNIST\\FashionMNIST\\raw'. I promise I'll conclude with an upvote hopefully if we can we resolve this last issue. – Hazzaldo Mar 11 '19 at 16:36
  • could you try to run the code on Spyder (python 3.5) yourself and see if it works for you. Do you have a Windows computer? – Hazzaldo Mar 11 '19 at 16:52
  • `r"C:\Users\"` try using this @Hazzaldo – Inder Mar 11 '19 at 16:57
  • unfortunately I don't have windows pc @Hazzaldo, but I can access a cloud VM will try to run it over there – Inder Mar 11 '19 at 16:57
  • also kindly update your torch-vision to the latest version – Inder Mar 11 '19 at 17:05
  • 2
    Great it worked even when I tried a longer path to the actual folder I wanted: `r'C:\Users\username\Desktop\Machine Learning A-Z course\PyTorch\data'`. And thanks by the way for the `r` at the beginning and saving me the time to workout why an error is happening because of Unicode. It's a shame it doesn't seem to work with just `'.\data\FMNIST\'`. Because then it makes the code more portable, not having to change the path every time (and `.\data` would be created wherever your source python file is stored). I guess Windows is just inconvenient for coding sometimes. Many thanks for your help. – Hazzaldo Mar 11 '19 at 17:32
  • 1
    The thread really helped. I had the same problem. In Mac, I changed from '/data/miniimagenet' to full path. – kkgarg Feb 05 '20 at 22:44