I am trying to make a dataframe that combines all the CSVs from a zip with spotify data from zenodo. The zip is the charts.zip from here: https://zenodo.org/record/4778563 . I have tried lots of things, but none of them work. The first code I tried (I found it ready) is this:
header = 0
dfs = []
for file in glob.glob('Charts/*/201?/*.csv'):
region = file.split('/')[1]
dates = re.findall('\d{4}-\d{2}-\d{2}', file.split('/')[-1])
weekly_chart = pd.read_csv(file, header=header, sep='\t')
weekly_chart['week_start'] = datetime.strptime(dates[0], '%Y-%m-%d')
weekly_chart['week_end'] = datetime.strptime(dates[1], '%Y-%m-%d')
weekly_chart['region'] = region
dfs.append(weekly_chart)
all_charts = pd.concat(dfs)
The error I got is this:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
C:\Users\EAEB~1\AppData\Local\Temp/ipykernel_20032/1761992769.py in <module>
10 dfs.append(weekly_chart)
11
---> 12 all_charts = pd.concat(dfs)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)
309 stacklevel=stacklevel,
310 )
--> 311 return func(*args, **kwargs)
312
313 return wrapper
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\reshape\concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
292 ValueError: Indexes have overlapping values: ['a']
293 """
--> 294 op = _Concatenator(
295 objs,
296 axis=axis,
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\reshape\concat.py in __init__(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
349
350 if len(objs) == 0:
--> 351 raise ValueError("No objects to concatenate")
352
353 if keys is None:
ValueError: No objects to concatenate
I found a guy who said replacing / with \ makes the code run, but I got the same error. Then I found a similar question, which hasn't been answered, that suggested this:
file_list = []
for path, subdirs, files in os.walk("Charts"):
file_list.extend([os.path.join(path, x) for x in files if x.endswith('.csv')])
dfs = []
for file in file_list:
region = file.split('/')[1]
dates = re.findall('\d{4}-\d{2}-\d{2}', file.split('/')[-1])
df = pd.read_csv(file, sep='\t')
df['week_start'] = dates[0]
df['week_end'] = dates[1]
df['region'] = region
dfs.append(df)
all_charts = pd.concat(dfs, ignore_index=True)
print(all_charts)
I tried it, but I got the same error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
C:\Users\EAEB~1\AppData\Local\Temp/ipykernel_20032/1953964848.py in <module>
12 df['region'] = region
13 dfs.append(df)
---> 14 all_charts = pd.concat(dfs, ignore_index=True)
15 print(all_charts)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)
309 stacklevel=stacklevel,
310 )
--> 311 return func(*args, **kwargs)
312
313 return wrapper
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\reshape\concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
292 ValueError: Indexes have overlapping values: ['a']
293 """
--> 294 op = _Concatenator(
295 objs,
296 axis=axis,
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\reshape\concat.py in __init__(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
349
350 if len(objs) == 0:
--> 351 raise ValueError("No objects to concatenate")
352
353 if keys is None:
ValueError: No objects to concatenate
How can I fix this issue?