I just want to know how to concat multiple data frames assuming the columns are the same but the indexes might be duplicates. Like this:
df1 = pd.read_csv("some_data1.csv")
df2 = pd.read_csv("some_data2.csv")
df3 = pd.read_csv("some_data3.csv")
# DF1
CITY COUNT DAY TIME_BIN
0 ATLANTA 514 Friday 19:30:00
1 ATLANTA 398 Monday 01:20:00
2 CHICAGO 591 Saturday 12:20:00
3 ATLANTA 339 Friday 21:10:00
# DF2
CITY COUNT DAY TIME_BIN
0 DENVER 98 Wednesday 18:10:00
1 MIAMI 666 Wednesday 13:20:00
2 CHICAGO 165 Saturday 12:40:00
# DF3
CITY COUNT DAY TIME_BIN
0 CHICAGO 457 Thursday 06:00:00
1 CHICAGO 590 Saturday 19:00:00
2 PHOENIX 301 Saturday 20:30:00
How do I do this
CITY COUNT DAY TIME_BIN
0 ATLANTA 514 Friday 19:30:00
1 ATLANTA 398 Monday 01:20:00
2 CHICAGO 591 Saturday 12:20:00
3 ATLANTA 339 Friday 21:10:00
4 DENVER 98 Wednesday 18:10:00
5 MIAMI 666 Wednesday 13:20:00
6 CHICAGO 165 Saturday 12:40:00
7 CHICAGO 457 Thursday 06:00:00
8 CHICAGO 590 Saturday 19:00:00
9 PHOENIX 301 Saturday 20:30:00
Surely someone must have asked this by now but I can't seem to find the question. I need to concat 15 df's. Here's the code to generate test data.
import pandas as pd
import numpy as np
df = pd.DataFrame({ 'CITY' : np.random.choice(['PHOENIX','ATLANTA','CHICAGO', 'MIAMI', 'DENVER'], 10000),
'DAY': np.random.choice(['Monday','Tuesday','Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'], 10000),
'TIME_BIN': np.random.randint(1, 86400, size=10000),
'COUNT': np.random.randint(1, 700, size=10000)})
df['TIME_BIN'] = pd.to_datetime(df['TIME_BIN'], unit='s').dt.round('10min').dt.strftime('%H:%M:%S')
print(df)