Creating a Dataframe from Zenodo zip file with multiple CSVs about Spotify

Question

I am trying to make a dataframe that combines all the CSVs from a zip with spotify data from zenodo. The zip is the charts.zip from here: https://zenodo.org/record/4778563 . I have tried lots of things, but none of them work. The first code I tried (I found it ready) is this:

header = 0
dfs = []
for file in glob.glob('Charts/*/201?/*.csv'):
    region = file.split('/')[1]
    dates = re.findall('\d{4}-\d{2}-\d{2}', file.split('/')[-1])
    weekly_chart = pd.read_csv(file, header=header, sep='\t')
    weekly_chart['week_start'] = datetime.strptime(dates[0], '%Y-%m-%d')
    weekly_chart['week_end'] = datetime.strptime(dates[1], '%Y-%m-%d')
    weekly_chart['region'] = region
    dfs.append(weekly_chart)

all_charts = pd.concat(dfs)

The error I got is this:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
C:\Users\EAEB~1\AppData\Local\Temp/ipykernel_20032/1761992769.py in <module>
     10     dfs.append(weekly_chart)
     11 
---> 12 all_charts = pd.concat(dfs)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)
    309                     stacklevel=stacklevel,
    310                 )
--> 311             return func(*args, **kwargs)
    312 
    313         return wrapper

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\reshape\concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    292     ValueError: Indexes have overlapping values: ['a']
    293     """
--> 294     op = _Concatenator(
    295         objs,
    296         axis=axis,

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\reshape\concat.py in __init__(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
    349 
    350         if len(objs) == 0:
--> 351             raise ValueError("No objects to concatenate")
    352 
    353         if keys is None:

ValueError: No objects to concatenate

I found a guy who said replacing / with \ makes the code run, but I got the same error. Then I found a similar question, which hasn't been answered, that suggested this:

file_list = []
for path, subdirs, files in os.walk("Charts"):
    file_list.extend([os.path.join(path, x) for x in files if x.endswith('.csv')])

dfs = []
for file in file_list:
    region = file.split('/')[1]
    dates = re.findall('\d{4}-\d{2}-\d{2}', file.split('/')[-1])
    df = pd.read_csv(file, sep='\t')
    df['week_start'] = dates[0]
    df['week_end'] = dates[1]
    df['region'] = region
    dfs.append(df)
all_charts = pd.concat(dfs, ignore_index=True)
print(all_charts)

I tried it, but I got the same error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
C:\Users\EAEB~1\AppData\Local\Temp/ipykernel_20032/1953964848.py in <module>
     12     df['region'] = region
     13     dfs.append(df)
---> 14 all_charts = pd.concat(dfs, ignore_index=True)
     15 print(all_charts)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)
    309                     stacklevel=stacklevel,
    310                 )
--> 311             return func(*args, **kwargs)
    312 
    313         return wrapper

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\reshape\concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    292     ValueError: Indexes have overlapping values: ['a']
    293     """
--> 294     op = _Concatenator(
    295         objs,
    296         axis=axis,

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\reshape\concat.py in __init__(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
    349 
    350         if len(objs) == 0:
--> 351             raise ValueError("No objects to concatenate")
    352 
    353         if keys is None:

ValueError: No objects to concatenate

How can I fix this issue?

The first code you posted works fine for me, make sure you are running it in the parent directory of the Charts folder though (not inside the Charts folder). If you are running it inside the Charts folder just replace ```for file in glob.glob('Charts/*/201?/*.csv'):``` with ```for file in glob.glob('*/201?/*.csv'):``` — rob_med, May 03 '22 at 10:36

Creating a Dataframe from Zenodo zip file with multiple CSVs about Spotify

0 Answers0