2

I am completely new to Python but am currently doing some web scraping of some tables and I am trying to combine the 2 dataframes that I get into 1 csv file.

So lets say I have df_1 which looks like:

Rank Team Points
64 AR Lit Rock 35
102 Abl Christian 75
34 Air Force 45

And df_2 which looks like:

Rank Team Points
354 AR Lit Rock 46
284 Abl Christian 68
82 Air Force 42

(I have both dataframes being sorted by Team already) I am wanting to combine both of these into 1 csv file to eventually end up looking like:

Rank Team Points Rank Team Points
64 AR Lit Rock 35 354 AR Lit Rock 46
102 Abl Christian 75 284 Abl Christian 68
34 Air Force 45 82 Air Force 42

or if at all possible to separate the 2 tables with a blank column inbetween:

a b c d e f g h
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8

I am looking at doing some calculations between the 2 tables within the csv at a later point and just want it to be more readable. Since I am completely new to Python, this already may be a bad idea to have both tables in 1 csv file. If that is also the case, please let me know.

Currently I have both dataframes in their own respective objects and so far have only attempted concat.

df = pd.concat([df_1, df_2], axis=1)

This had kind of done what I want except the fact the I also have both dataframes sorted alphabetically based off of a certain column and then when they concat, the 1st dataframe is sorted and the 2nd gets sorted dependent on another value from the 1st dataframe. So end result looked like this instead:

Rank Team Points Rank Team Points
64 AR Lit Rock 35 64 Oregon 57
102 Abl Christian 75 102 Sac State 81
34 Air Force 45 34 Ark Pine Bl 59

2nd dataframe getting added by Rank rather than acting independently.

(Sorry, had trouble making sample tables)

Actual sample data

I have added a little snippet of what the current csv looks like after putting both dataframes into the csv. Both dataframes are sorted alphabetically by the Team name initially, but when the 2nd dataframe gets added it gets sorted by the Rank from the 1st dataframe.

UPDATE (edited code from Soroosh to better show the issue)

import pandas as pd
data1 = {
    'Rank': [64, 102, 34,],
    'Team': ["AR Lit Rock", "Abl Christian", "Air Force",],
    'Points': [35, 75, 45,],
}

data2 = {
    'Rank': [354, 284, 82,],
    'Team': ["Air Force", "Abl Christian", "AR Lit Rock",],
    'Points': [46, 68, 42,],
}



df_1 = pd.DataFrame(data1)
df_2 = pd.DataFrame(data2)
df_1 = df_1.sort_values('Team')
df_2 = df_2.sort_values('Team')
data_blk = {
    "Blank": [""]*len(df_1["Rank"]),
}
df_blk = pd.DataFrame(data_blk)

new_df = pd.concat([df_1, df_blk], axis=1)
new_df = pd.concat([new_df, df_2], axis=1)
print(new_df.head())

I take 2 unsorted tables, sort by the Team column, and then both dataframes to stay sorted by that Team column when concat the dataframes.

Pranav Hosangadi
  • 23,755
  • 7
  • 44
  • 70
Link
  • 113
  • 7
  • Look at this link: https://stackoverflow.com/questions/28097222/pandas-merge-two-dataframes-with-different-columns – Soroosh Noorzad Feb 20 '23 at 21:46
  • You can also use this method: new_df = pd.concat([df_1, df_2], axis=1, ignore_index=True) – Soroosh Noorzad Feb 20 '23 at 21:49
  • I added a sample snippet of the actual data if that may help at all. The ignore_index doesn't seem to change anything for me. – Link Feb 20 '23 at 22:09
  • Recheck my answer. I edited it and hope it helps. – Soroosh Noorzad Feb 20 '23 at 22:50
  • I paved the way. I hope others can answer your question. But I think you are trying to solve your whole issue in one question. I'm here to answer one of your questions. Not solving the whole project. If you think my answer was helpful, vote me up and search for the solution of next steps. Good luck with your project. – Soroosh Noorzad Feb 20 '23 at 23:16
  • I edited your question to remove the **ANSWER** section. Since you have an answer to your question, you should post that as an _answer_, not add it to your _question_. – Pranav Hosangadi Feb 21 '23 at 17:32

1 Answers1

0

after giving the samples, I tried this and it worked for me:

import pandas as pd

data1 = {
    'A': [1, 1,],
    'B': [2, 2,],
    'C': [3, 3,],
    'D': [4, 4,],
}
data2 = {
    'E': [5, 5,],
    'F': [6, 6,],
    'G': [7, 7,],
    'H': [8, 8,],
}
df_1 = pd.DataFrame(data1)
df_blk = pd.DataFrame({"Blank":["", "",]})
df_2 = pd.DataFrame(data2)
new_df = pd.concat([df_1, df_blk], axis=1)
new_df = pd.concat([new_df, df_2], axis=1)
print(new_df.head())

and the output:

   A  B  C  D Blank  E  F  G  H
0  1  2  3  4        5  6  7  8
1  1  2  3  4        5  6  7  8

So, in your case:

import pandas as pd
data1 = {
    'Rank': [64, 102, 34,],
    'Team': ["AR Lit Rock", "Abl Christian", "Air Force",],
    'Points': [35, 75, 45,],
}

data2 = {
    'Rank': [354, 284, 82,],
    'Team': ["AR Lit Rock", "Abl Christian", "Air Force",],
    'Points': [46, 68, 42,],
}

df_1 = pd.DataFrame(data1)
df_2 = pd.DataFrame(data2)
data_blk = {
    "Blank": [""]*len(df_1["Rank"]),
}
df_blk = pd.DataFrame(data_blk)

new_df = pd.concat([df_1, df_blk], axis=1)
new_df = pd.concat([new_df, df_2], axis=1)
print(new_df.head())

will result in:

   Rank           Team  Points Blank  Rank           Team  Points
0    64    AR Lit Rock      35         354    AR Lit Rock      46
1   102  Abl Christian      75         284  Abl Christian      68
2    34      Air Force      45          82      Air Force      42
Soroosh Noorzad
  • 503
  • 1
  • 6
  • 18
  • Thank you for this. This helped me understand how to get the blank column in between. Current problem is that my 2nd dataframe is being sorted based off the 1st dataframes Rank column. I have edited a lot of information above in the original question to maybe help with some more understanding. – Link Feb 20 '23 at 22:41
  • Isn't the concat in this case just sorting based off of the index though? I pull 2 unsorted tables into dataframes, sort by Team name, and then want that to be my output when exporting to csv. – Link Feb 20 '23 at 22:54
  • @Link Yes. Concat is merging these two dfs based on their indexes. So, If you want to concat() them with different orders, sort them before this process. – Soroosh Noorzad Feb 20 '23 at 22:59
  • I added another answer which has edited some of your sample to better reflect the issue. – Link Feb 20 '23 at 23:07
  • @Link Remove your answer and add your description to your question. You can reference me as "based on Soroosh Answer I edited the question like this. But..." Don't add your content in the answer format. That method is for some different uses. – Soroosh Noorzad Feb 20 '23 at 23:13