Create the cartesian product (cross join) of two csv files in python

Question

I tried to create a new csv file by cross joining two existing csv files.

csv file #1:

hour    Elevation   Azimuth x   y   z   sunx    suny    sunz
06:29:00    -0.833  67.72   0.379094033 0.925243946 -0.014538068    0.379094033 0.925243946 -0.014538068
07:00:00    6.28    68.75   0.360264063 0.92641472  0.109387255 0.360264063 0.92641472  0.109387255

csv file #2:

ID  SURFACES    A1X A1Y A1Z A2X A2Y A2Z B1X B1Y B1Z B2X B2Y B2Z AX  AY  AZ  BX  BY  BZ  ABX ABY ABZ planex  planey  planez
1   GROUND  800085.3323 961271.977  -3.07E-18   800080.8795 961246.1978 -3.07E-18   800097.1572 961269.9344 -3.07E-18   800085.3323 961271.977  -3.07E-18   4.4528  25.7792 0.00E+00    11.8249 -2.0426 0.00E+00    0   0   -313.9317514    0   0   -1
2   ROOF    800019.3994 961242.7732 12  800021.442  961254.5981 12  800090.3488 961230.5181 12  800019.3994 961242.7732 12  -2.0426 -11.8249    0.00E+00    70.9494 -12.2551    0.00E+00    0   0   864.0018273 0

I want the cartesian product of the files (each hour with all surfaces, just like performing a SQL cross join).

An illustration of what I am asking:
http://dotnetslackers.com/images/articleimages/sqljoins5.jpg

Can you give us a small example just to make sure what you mean? — import this, Jun 18 '14 at 10:43

score 8 · Answer 1 · edited May 23 '17 at 11:46

You can do this with pandas, as if it was an SQL join of two tables. The differences are:

they call the generalized function to perform joins merge, even though there is a convenience join function, too
add an extra column that has a constant value throughout the left and the right tables to get a cartesian product.

Using exactly the same files you have posted as example1.csv and example2.csv:

import pandas as pd

df_1 = pd.read_csv('example1.csv', delim_whitespace=True)
df_2 = pd.read_csv('example2.csv', delim_whitespace=True)
df_1['key'] = 1
df_2['key'] = 1

product = pd.merge(df_1, df_2, on='key')

product[['hour', 'SURFACES']]

Results in:

     hour            SURFACES
0    06:29:00    GROUND
1    06:29:00    ROOF
2    07:00:00    GROUND
3    07:00:00    ROOF

import this · Accepted Answer · 2014-06-18T11:43:18.343

I don't know of any out-of-the-box solution, so I made this:

import csv
from itertools import product

def main():
    with open('file1.csv', 'rb') as f1, open('file2.csv', 'rb') as f2:
        reader1 = csv.reader(f1, dialect=csv.excel_tab)
        reader2 = csv.reader(f2, dialect=csv.excel_tab)

        # Step 1: Read and write the headers separately.
        header1, header2 = next(reader1), next(reader2)
        with open('output.csv', 'wb') as out:
            writer = csv.writer(out, dialect=csv.excel_tab)
            writer.writerow(header1 + header2)
            # Step 2: Write the product of the rest of the rows.
            writer.writerows(
                row1 + row2 for row1, row2 in product(reader1, reader2))

main()

With files:

file1.csv

hour    Elevation   Azimuth
06:29:00    -0.833  67.72
07:00:00    6.28    68.75

file2.csv

ID  SURFACES
1   GROUND
2   ROOF

you get the following output.csv:

hour    Elevation   Azimuth ID  SURFACES
06:29:00    -0.833  67.72   1   GROUND
06:29:00    -0.833  67.72   2   ROOF
07:00:00    6.28    68.75   1   GROUND
07:00:00    6.28    68.75   2   ROOF

That's what it does, a cartesian product. `itertools.product` does the work. — import this, Jun 18 '14 at 11:36

dmitry_romanov · Answer 3 · 2014-06-19T13:46:12.147

0

Updated version after the comment by user*:

    f1 = open("file1")
    f2 = open("file2")
    f3 = open("result", "wt")
    for a in f1:
        for b in f2:
            f3.write(a.rstrip('\n'))
            f3.write(' ')
            f3.write(b)

edited Jun 19 '14 at 13:46

answered Jun 18 '14 at 09:51

dmitry_romanov

5,146
1
33
36

its connecting row by row between two files.. i want cross join between two files just likeeach row in one file connect all rows in another file.. – user2177232 Jun 18 '14 at 10:29

Create the cartesian product (cross join) of two csv files in python

3 Answers3

Linked