-1

I'm working with this tutorial, and I'm using pandas to go through.

This is the code fragment I typed and received an error:

users = pd.read_csv('ml-100k//u.users', sep = '|', names = ['User ID', 'Age','Gender', 'Occupation','Zip Code'])

Error produced by code above:

Traceback (most recent call last):
  File "<pyshell#33>", line 1, in <module>
    users = pd.read_csv('ml-100k//u.users', sep = '|', names = ['User ID', 'Age','Gender', 'Occupation','Zip Code'])
  File "C:\Program Files\Python 3.5\lib\site-packages\pandas\io\parsers.py", line 709, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "C:\Program Files\Python 3.5\lib\site-packages\pandas\io\parsers.py", line 449, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "C:\Program Files\Python 3.5\lib\site-packages\pandas\io\parsers.py", line 818, in __init__
    self._make_engine(self.engine)
  File "C:\Program Files\Python 3.5\lib\site-packages\pandas\io\parsers.py", line 1049, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "C:\Program Files\Python 3.5\lib\site-packages\pandas\io\parsers.py", line 1695, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas\_libs\parsers.pyx", line 402, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas\_libs\parsers.pyx", line 718, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: File b'ml-100k//u.users' does not exist

I figured it out how to make it work by writing:

users = pd.read_csv(r'C:\\Users\\User\\Documents\\Python3\\ml-100k\\ml-100k\\u.user', sep = '|', names = ['User ID', 'Age','Gender', 'Occupation','Zip Code'])

Is there an easier way to do this without writing in full file path? I use a Windows 64 Pro.

Lycopersicum
  • 529
  • 1
  • 6
  • 17
MRI
  • 1
  • 1
  • 1
  • 3
  • if you start the python shell from the `ml-100k` folder, you can just specify the file name rather than the entire path as the working directory will be `c:\...\ml-100k\` – Haleemur Ali Jan 10 '18 at 03:47
  • `'ml-100k//u.users'` - The problem may be the two forward slashes. Try a single forward slash. In windows you can use a single forward slash `/` or double backslash `\\\` as a separator. Or, if you make it a raw string, a single backslash `r"\"`, – tdelaney Jan 10 '18 at 04:00

2 Answers2

1

I find usefull to work with the pathlib module. I create Path objects on top of my scripts (or in a dedicated file) like this:

from pathlib import Path
path_1 = Path(r'C:\Users\User\Documents\Python3\ml-100k\ml-100k') # absolute path
path_2 = Path.cwd() # current working directory

Then this is helpfull for the rest of your scripts. You can use these objects like so:

user = pd.read_csv(path_2.joinpath('u.users'), sep = '|', names = ['User ID', 'Age','Gender', 'Occupation','Zip Code'])
Prikers
  • 858
  • 1
  • 9
  • 24
0

There are several ways to specify path in pandas.read_csv() documentation:

filepath_or_buffer : str, pathlib.Path, py._path.local.LocalPath or any object with a read() method (such as a file handle or StringIO)

The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. For instance, a local file could be file ://localhost/path/to/table.csv

  • class pathlib.Path(*pathsegments)

    A subclass of PurePath, this class represents concrete paths of the system’s path flavour (instantiating it creates either a PosixPath or a WindowsPath):

    >>> Path('setup.py')
    PosixPath('setup.py')
    

    pathsegments is specified similarly to PurePath.

  • class pathlib.WindowsPath(*pathsegments)

    A subclass of Path and PureWindowsPath, this class represents concrete Windows filesystem paths:

    >>> WindowsPath('c:/Program Files/')
    WindowsPath('c:/Program Files')
    

    pathsegments is specified similarly to PurePath.

  • class io.TextIOBase

    Base class for text streams. This class provides a character and line based interface to stream I/O. There is no readinto() method because Python’s character strings are immutable. It inherits IOBase. There is no public constructor. TextIOBase provides or overrides these data attributes and methods in addition to those from IOBase:

    read(size)

    Read and return at most size characters from the stream as a single str. If size is negative or None, reads until EOF.

TextIOBase provides or overrides these data attributes and methods in addition to those from IOBase:

Also I've found question about path specifying in Windows, answers say, that it can be done in several ways:

  • you can use always:

    'C:/mydir'
    
  • this works both in linux and windows. Other posibility is

    'C:\\mydir'
    
  • if you have problems with some names you can also try raw strings:

    r'C:\mydir'
    
  • however best practice is to use the os.path module functions that always select the correct configuration for your OS:

    os.path.join(mydir, myfile)
    

TLDR: Easiest and least complex way is by specifying path variable as

CSV_FILE = r'C:\Users\User\Documents\Python3\ml-100k\ml-100k\u.user'

or

CSV_FILE = 'C:\\Users\\User\\Documents\\Python3\\ml-100k\\ml-100k\\u.user'

and better, though still easy way would be

CSV_FILE = os.path.join('C:', 'Users', 'User', 'Documents', 'Python3', 'ml-100k', 'ml-100k', 'u.user')

you can also specify path relatively to your working directory (for example if you have your script in C:\Users\User\Documents\Python3\ml-100k\ml-100k, you could simply specify filename:

CSV_FILE = 'u.user'

also you can specify URL, like mentioned on the top.

Lycopersicum
  • 529
  • 1
  • 6
  • 17