0

I just want to know how to concat multiple data frames assuming the columns are the same but the indexes might be duplicates. Like this:

df1 = pd.read_csv("some_data1.csv")
df2 = pd.read_csv("some_data2.csv")
df3 = pd.read_csv("some_data3.csv")


# DF1
         CITY  COUNT        DAY  TIME_BIN
0     ATLANTA    514     Friday  19:30:00
1     ATLANTA    398     Monday  01:20:00
2     CHICAGO    591   Saturday  12:20:00
3     ATLANTA    339     Friday  21:10:00

# DF2
         CITY  COUNT        DAY  TIME_BIN
0      DENVER     98  Wednesday  18:10:00
1       MIAMI    666  Wednesday  13:20:00
2     CHICAGO    165   Saturday  12:40:00

# DF3
         CITY  COUNT        DAY  TIME_BIN
0     CHICAGO    457   Thursday  06:00:00
1     CHICAGO    590   Saturday  19:00:00
2     PHOENIX    301   Saturday  20:30:00

How do I do this

         CITY  COUNT        DAY  TIME_BIN
0     ATLANTA    514     Friday  19:30:00
1     ATLANTA    398     Monday  01:20:00
2     CHICAGO    591   Saturday  12:20:00
3     ATLANTA    339     Friday  21:10:00
4      DENVER     98  Wednesday  18:10:00
5       MIAMI    666  Wednesday  13:20:00
6     CHICAGO    165   Saturday  12:40:00
7     CHICAGO    457   Thursday  06:00:00
8     CHICAGO    590   Saturday  19:00:00
9     PHOENIX    301   Saturday  20:30:00

Surely someone must have asked this by now but I can't seem to find the question. I need to concat 15 df's. Here's the code to generate test data.

import pandas as pd
import numpy as np

df = pd.DataFrame({ 'CITY' : np.random.choice(['PHOENIX','ATLANTA','CHICAGO', 'MIAMI', 'DENVER'], 10000),
                    'DAY': np.random.choice(['Monday','Tuesday','Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'], 10000),
                    'TIME_BIN': np.random.randint(1, 86400, size=10000),
                    'COUNT': np.random.randint(1, 700, size=10000)})

df['TIME_BIN'] = pd.to_datetime(df['TIME_BIN'], unit='s').dt.round('10min').dt.strftime('%H:%M:%S')
print(df)
Calculus
  • 781
  • 7
  • 20

0 Answers0