0

When I print out the value, it is fine. But finally it gives an error as the b' messes up the name. I am reading files f0.txt, f1.txt; plan to modify them and save them with their same names.

Tried the utf-8 decoding solution offered at a similar question here, not working.

path ='/kaggle/input/'
print (path)
i = 1
part = 'f' + str(i)
print (part) #the byte literal symbol b' doesn't appear here
for i in range(0,3):
    part = 'f' + str(i)
    file_path = path + part + '.csv'
    print (file_path) # the byte literal symbol b' doesn't appear here too
    pd = pd.read_csv(file_path, delim_whitespace = True) # error in this line
    np.savetxt(part.txt, pd,fmt='%.18e', delimiter=',', newline='n', header='Time,ID,lat,long,speed',)

b'/kaggle/input/f0.csv' does not exist

The command

pd.read_csv('/kaggle/input/f2.txt', delim_whitespace = True) 

is working.

The error in the first case is:

FileNotFoundError                         Traceback (most recent call last)
<ipython-input-33-37d9e9ecc542> in <module>
      8     file_path = path + part + '.csv'
      9     print (file_path)
---> 10     pd = pd.read_csv(file_path, delim_whitespace = True)
     11     np.savetxt(part.txt, pd,fmt='%.18e', delimiter=',', newline='n', header='Time,ID,lat,long,speed',)
     12 

/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, doublequote, delim_whitespace, low_memory, memory_map, float_precision)
    676                     skip_blank_lines=skip_blank_lines)
    677 
--> 678         return _read(filepath_or_buffer, kwds)
    679 
    680     parser_f.__name__ = name

/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    438 
    439     # Create the parser.
--> 440     parser = TextFileReader(filepath_or_buffer, **kwds)
    441 
    442     if chunksize or iterator:

/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
    785             self.options['has_index_names'] = kwds['has_index_names']
    786 
--> 787         self._make_engine(self.engine)
    788 
    789     def close(self):

/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
   1012     def _make_engine(self, engine='c'):
   1013         if engine == 'c':
-> 1014             self._engine = CParserWrapper(self.f, **self.options)
   1015         else:
   1016             if engine == 'python':

/opt/conda/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   1706         kwds['usecols'] = self.usecols
   1707 
-> 1708         self._reader = parsers.TextReader(src, **kwds)
   1709 
   1710         passed_names = self.names is None

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

FileNotFoundError: File b'/kaggle/input/f0.csv' does not exist
wjandrea
  • 28,235
  • 9
  • 60
  • 81
user3656142
  • 437
  • 3
  • 14
  • Are you aware that the filesystem has its own encoding? – wjandrea Aug 05 '19 at 23:43
  • @wjandrea nope, I just started working on python. By file system, you mean the OS or the compiler? – user3656142 Aug 06 '19 at 00:28
  • A [filesystem](https://en.wikipedia.org/wiki/File_system) is basically a data structure that holds files. But anyway, looking at Grismar's answer, it's not relevant. – wjandrea Aug 06 '19 at 00:32

1 Answers1

2

Even this reproduces the behaviour you're seeing:

import pandas as pd

pd.pd.read_csv('test.csv')

Assuming you don't have a file called test.csv in the working folder, it results in:

FileNotFoundError: [Errno 2] File b'test.csv' does not exist: b'test.csv'

So, it appears the .read_csv() method accepts a string as a filename, but turns it into a byte sequence before using it to open the file and when that fails, that is the value that is reported.

It is important to note that the value printed (i.e. b'test.csv') does not mean that the b is now part of the filename. It simply means "a byte sequence containing test.csv", to set it apart from 'test.csv', without a b, which means "a string containing test.csv".

Grismar
  • 27,561
  • 4
  • 31
  • 54
  • 2
    Parts of pandas are written in cython. Cython can get better performance by using c types instead of regular python types. I believe that's the reason the file path is encoded to bytes here. – Håken Lid Aug 06 '19 at 00:12
  • @Grismar if I directly refer to the file from the containing directory, I get a similar error as you. If I reference from another directory, the ` b' ` gets added as the first letter as mentioned in the post. – user3656142 Aug 06 '19 at 01:53
  • 1
    @user3656142 as far as I can tell, the `b` is always displayed when referring to non-existent files, as the error message is generated as a result of calling `.read_csv()`. It doesn't matter whether you reference a file inside or outside the working directory of your script. If you see different behaviour (for non-existent files) in different cases, please provide examples for both in your question and I will try to update my answer accordingly. Note that the `b` is not actually part of the value - it's merely printed by Python to indicate that it is a byte sequence and not a character string. – Grismar Aug 06 '19 at 01:57
  • @Grismar The file exists in the directory. The ls command shows the files to be there. And in situations where the file I referenced to was not there, the `b never appeared. – user3656142 Aug 06 '19 at 02:06
  • @user3656142 In your code, double-check that you're in the right place and the file exists with `print(repr(os.getcwd()))` and `print(os.listdir())` – wjandrea Aug 06 '19 at 13:52
  • I'd recommend checking that from your code, also do you have a directory called `kaggle` at the root of your file system? It seems likely that you meant to write `kaggle/input` instead of `/kaggle/input`. – Grismar Aug 07 '19 at 00:40