1

I am getting this error:

FileNotFoundError: [Errno 2] File path/.csv does not exist: path/.csv

However the file actually is still there and I do not understand what it is wrong in my code for accessing it. Could you please have a look and see if you spot any error? Thank you

import pandas as pd
from os import listdir
from os.path import join, isfile
import os


def create_dataframe(paths):

    def get_files_in_path(path):
        return [f.split('.')[0] for f in listdir(path) if isfile(join(path, f))]

    dataframes = {
        (path, file): pd.read_csv(path + file + '.csv')
        for path in paths
        for file in get_files_in_path(path)
    }

    df = pd.concat(dataframes, names=['path', 'file', '_'])

paths = [f"path/My folder {f}/" for f in ['file1', 'file2', 'file3']]
data = create_dataframe(paths)

The error is in this line:

---> 18 for file in get_files_in_path(path)

The code should append in one unique dataframe all the csv files stored in file1, file2, file3 folders. The files are csv. They are called test+first.csv, another_test.csv. The path is path/My folder file1 and path/My folder file 2 and path/My folder file 3 . The expected output would be something like this (in terms of indices with path and file): path would be user_id and file would be date in the image below.

enter image description here

  • Can you please mention the files on which you want to run this? Is the expected file path `pathfile1/` or `path/file1`? – ranka47 Jun 30 '20 at 20:11
  • yes, sorry ranka47. The files are `csv`. They are called `test+first.csv`, `another_test.csv`. The path is `path/My folder file1` and `path/My folder file 2` and `path/My folder file 3` respectively. –  Jun 30 '20 at 20:16
  • 1
    So in the second last line where you are creating a list `paths` shouldn't it be `f"path/{f}/`? – ranka47 Jun 30 '20 at 20:19
  • Yes, it was my mistake in the post. Unfortunately it does not change the result. I am still getting the error. I updated the code –  Jun 30 '20 at 20:21
  • I checked one folder a time and it seems to be not working at all. The files exist in the folder but it gives me the error. I have also tried to delete the folder and create a new one, but nothing has changed. Could it be something wrong in the listdir, isfile?? –  Jun 30 '20 at 20:38
  • 1
    @Val Do all of the My folder paths have spaces between "file" and the number? I see some that do and some that don't in the examples you gave, but the code has them all without any space (file1, file2, file3) – kcontr Jun 30 '20 at 20:39
  • Yes, the names of the folders are: My folder file1, My folder file2, My folder file3. There is a space between my folder and file. Some folder can be open, some other no. I really do not get what I am doing wrong :/ –  Jun 30 '20 at 20:41
  • I am not able to duplicate the problem; which folders/files specifically is it throwing the error on. could it be improperly encoded names (special/unallowed characters)? Older versions of Python was plagued with encoding problems imho – Steve Byrne Jun 30 '20 at 21:49
  • @SteveByrne, I used the same folders without changing name this afternoon and yesterday evening. I was able to create the dataframe using my code, but this evening I have had some issue to access the folders and files using the same. I am using Python 3. –  Jun 30 '20 at 21:53

1 Answers1

0

If you really want every csv inside every directory of path/ you could also consider going straight for a glob

That was the technique on a similar problem here https://stackoverflow.com/a/21232849/11199887

For your case it would look something like

import pandas as pd
from os.path import join
import glob

# the parent of the `/My folder` stuff
base_path = 'path/'
# the ** in combination with recursive=True descends into all directories
all_csv_filenames = glob.glob(
    join(base_path, "My folder*/**/*.csv"), recursive=True
)
df = pd.concat(
    (pd.read_csv(filename) for filename in all_csv_filenames),
    names=['path', 'file', '_'],
)

Which should match all csv files that are in folders of the variation "My folder" such as "path/My folder file1/random.csv" or "path/My folder file 2/another_test.csv" Want to give that a try?

Update:

After learning more from the screenshot and comments, it may be something like this

import pandas as pd
from pathlib import Path
from os.path import join
import glob

# the parent of the `/My folder` stuff
base_path = Path('path/')
glob_pattern = str(base_path.joinpath("My folder*/*.csv"))
all_csv_filenames = glob.glob(glob_pattern)
df = pd.concat(
    {
        (str(Path(filename).parent), str(Path(filename).stem)): pd.read_csv(
            filename
        )
        for filename in all_csv_filenames
    },
    names=['path', 'file', '_'],
)

parent gets your parent directory, stem gets your file name without the extension, and then they get turned into strings and passed as a tuple for the mapping to keys in order for pd.concat to make use of the names parameter.

kcontr
  • 343
  • 2
  • 12
  • @Val is there any nesting below the level of "My folder ...", for example "My folder file2/nested_folder/random.csv" or is it all csv files and no folder inside as in "My folder file2/random.csv"? – kcontr Jun 30 '20 at 21:58
  • 1
    The complete path is the following: `User/Desktop/New_Folder/My folder file 1/test+first.csv`, `User/Desktop/New_Folder/My folder file 1/test+second_2.csv`, `User/Desktop/New_Folder/My folder file 2/another_test.csv`, `User/Desktop/New_Folder/My folder file 3/test.csv` –  Jun 30 '20 at 22:01
  • @Val I've made an update to my answer with that information. I think it gets us close. Can you take a look? – kcontr Jun 30 '20 at 22:08
  • 1
    Holdup, let me fix imports – kcontr Jun 30 '20 at 22:10
  • Thank you kcontr for your patience in helping me. May I ask you which modules need to run the code? I got: `ImportError: cannot import name 'joinpath' from 'pathlib' (/anaconda3/lib/python3.7/pathlib.py)` –  Jun 30 '20 at 22:18
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/216983/discussion-between-kcontr-and-val). – kcontr Jun 30 '20 at 22:18