Adding 2 Separate Dataframes into 1 csv File

Question

I am completely new to Python but am currently doing some web scraping of some tables and I am trying to combine the 2 dataframes that I get into 1 csv file.

So lets say I have df_1 which looks like:

Rank	Team	Points
64	AR Lit Rock	35
102	Abl Christian	75
34	Air Force	45

And df_2 which looks like:

Rank	Team	Points
354	AR Lit Rock	46
284	Abl Christian	68
82	Air Force	42

(I have both dataframes being sorted by Team already) I am wanting to combine both of these into 1 csv file to eventually end up looking like:

Rank	Team	Points	Rank	Team	Points
64	AR Lit Rock	35	354	AR Lit Rock	46
102	Abl Christian	75	284	Abl Christian	68
34	Air Force	45	82	Air Force	42

or if at all possible to separate the 2 tables with a blank column inbetween:

a	b	c	d		e	f	g	h
1	2	3	4		5	6	7	8
1	2	3	4		5	6	7	8

I am looking at doing some calculations between the 2 tables within the csv at a later point and just want it to be more readable. Since I am completely new to Python, this already may be a bad idea to have both tables in 1 csv file. If that is also the case, please let me know.

Currently I have both dataframes in their own respective objects and so far have only attempted concat.

df = pd.concat([df_1, df_2], axis=1)

This had kind of done what I want except the fact the I also have both dataframes sorted alphabetically based off of a certain column and then when they concat, the 1st dataframe is sorted and the 2nd gets sorted dependent on another value from the 1st dataframe. So end result looked like this instead:

Rank	Team	Points	Rank	Team	Points
64	AR Lit Rock	35	64	Oregon	57
102	Abl Christian	75	102	Sac State	81
34	Air Force	45	34	Ark Pine Bl	59

2nd dataframe getting added by Rank rather than acting independently.

(Sorry, had trouble making sample tables)

Actual sample data

I have added a little snippet of what the current csv looks like after putting both dataframes into the csv. Both dataframes are sorted alphabetically by the Team name initially, but when the 2nd dataframe gets added it gets sorted by the Rank from the 1st dataframe.

UPDATE (edited code from Soroosh to better show the issue)

import pandas as pd
data1 = {
    'Rank': [64, 102, 34,],
    'Team': ["AR Lit Rock", "Abl Christian", "Air Force",],
    'Points': [35, 75, 45,],
}

data2 = {
    'Rank': [354, 284, 82,],
    'Team': ["Air Force", "Abl Christian", "AR Lit Rock",],
    'Points': [46, 68, 42,],
}



df_1 = pd.DataFrame(data1)
df_2 = pd.DataFrame(data2)
df_1 = df_1.sort_values('Team')
df_2 = df_2.sort_values('Team')
data_blk = {
    "Blank": [""]*len(df_1["Rank"]),
}
df_blk = pd.DataFrame(data_blk)

new_df = pd.concat([df_1, df_blk], axis=1)
new_df = pd.concat([new_df, df_2], axis=1)
print(new_df.head())

I take 2 unsorted tables, sort by the Team column, and then both dataframes to stay sorted by that Team column when concat the dataframes.

Look at this link: https://stackoverflow.com/questions/28097222/pandas-merge-two-dataframes-with-different-columns — Soroosh Noorzad, Feb 20 '23 at 21:46
You can also use this method: new_df = pd.concat([df_1, df_2], axis=1, ignore_index=True) — Soroosh Noorzad, Feb 20 '23 at 21:49
I added a sample snippet of the actual data if that may help at all. The ignore_index doesn't seem to change anything for me. — Link, Feb 20 '23 at 22:09
I paved the way. I hope others can answer your question. But I think you are trying to solve your whole issue in one question. I'm here to answer one of your questions. Not solving the whole project. If you think my answer was helpful, vote me up and search for the solution of next steps. Good luck with your project. — Soroosh Noorzad, Feb 20 '23 at 23:16
I edited your question to remove the **ANSWER** section. Since you have an answer to your question, you should post that as an _answer_, not add it to your _question_. — Pranav Hosangadi, Feb 21 '23 at 17:32

Soroosh Noorzad · Answer 1 · 2023-02-20T22:49:20.230

0

after giving the samples, I tried this and it worked for me:

import pandas as pd

data1 = {
    'A': [1, 1,],
    'B': [2, 2,],
    'C': [3, 3,],
    'D': [4, 4,],
}
data2 = {
    'E': [5, 5,],
    'F': [6, 6,],
    'G': [7, 7,],
    'H': [8, 8,],
}
df_1 = pd.DataFrame(data1)
df_blk = pd.DataFrame({"Blank":["", "",]})
df_2 = pd.DataFrame(data2)
new_df = pd.concat([df_1, df_blk], axis=1)
new_df = pd.concat([new_df, df_2], axis=1)
print(new_df.head())

and the output:

   A  B  C  D Blank  E  F  G  H
0  1  2  3  4        5  6  7  8
1  1  2  3  4        5  6  7  8

So, in your case:

import pandas as pd
data1 = {
    'Rank': [64, 102, 34,],
    'Team': ["AR Lit Rock", "Abl Christian", "Air Force",],
    'Points': [35, 75, 45,],
}

data2 = {
    'Rank': [354, 284, 82,],
    'Team': ["AR Lit Rock", "Abl Christian", "Air Force",],
    'Points': [46, 68, 42,],
}

df_1 = pd.DataFrame(data1)
df_2 = pd.DataFrame(data2)
data_blk = {
    "Blank": [""]*len(df_1["Rank"]),
}
df_blk = pd.DataFrame(data_blk)

new_df = pd.concat([df_1, df_blk], axis=1)
new_df = pd.concat([new_df, df_2], axis=1)
print(new_df.head())

will result in:

   Rank           Team  Points Blank  Rank           Team  Points
0    64    AR Lit Rock      35         354    AR Lit Rock      46
1   102  Abl Christian      75         284  Abl Christian      68
2    34      Air Force      45          82      Air Force      42

edited Feb 20 '23 at 22:49

answered Feb 20 '23 at 22:26

Soroosh Noorzad

503
1
6
18

Thank you for this. This helped me understand how to get the blank column in between. Current problem is that my 2nd dataframe is being sorted based off the 1st dataframes Rank column. I have edited a lot of information above in the original question to maybe help with some more understanding. – Link Feb 20 '23 at 22:41
Isn't the concat in this case just sorting based off of the index though? I pull 2 unsorted tables into dataframes, sort by Team name, and then want that to be my output when exporting to csv. – Link Feb 20 '23 at 22:54
@Link Yes. Concat is merging these two dfs based on their indexes. So, If you want to concat() them with different orders, sort them before this process. – Soroosh Noorzad Feb 20 '23 at 22:59
I added another answer which has edited some of your sample to better reflect the issue. – Link Feb 20 '23 at 23:07
@Link Remove your answer and add your description to your question. You can reference me as "based on Soroosh Answer I edited the question like this. But..." Don't add your content in the answer format. That method is for some different uses. – Soroosh Noorzad Feb 20 '23 at 23:13

Adding 2 Separate Dataframes into 1 csv File

1 Answers1