Splitting Pandas dataframe by column when duplicate column names exist

Question

I have a dataset where I must read in the data as a 2d array using

import csv

with open('#Name.csv', newline ='') as csvfile:
     arrayFull = list(csv.reader(csvfile))

which creates a 2d array. I then use

for i in range(2):
    arrayFull.pop(0)

to remove the first two rows of the 2d array (my dataset only requires data from the 3rd row and below). I then assign the 2d array to a Pandas Dataframe using

import pandas as pd
dataframe_1 = pd.DataFrame(arrayFull)

Now I am trying to split "dataframe_1" into 2 dataframes by column. I have 8 columns and I want 2 dataframes with 4 columns each. The issue arises due to the column names being A_first, A_second, A_third, A_fourth, A_first, A_second, A_third, A_fourth.

I cannot use the pandas dataframe copy() function because there are duplicate column names. mangle_dupe_colsalso does not work from what I understand because that requires the csv to be read dataframe from the beginning, but I created a dataframe by setting a 2d array. Any ideas on what to do?

please include [`reproducible example`](https://stackoverflow.com/a/20159305/4985099) — sushanth, Aug 30 '20 at 05:06

score 0 · Accepted Answer · answered Aug 30 '20 at 05:32

0

You can use df.iloc to use numeric index.

df_first_four_cols = dataframe_1.iloc[:,0:4]
df_second_four_cols = dataframe_1.iloc[:,4:]

answered Aug 30 '20 at 05:32

fusion

1,327
6
12

Splitting Pandas dataframe by column when duplicate column names exist

1 Answers1