I am trying to cross join to tables.
import os
import pandas as pd
import sys
import pyodbc
con = pyodbc.connect("DRIVER= {NetezzaSQL};SERVER=netezza.xxx.com;PORT=5480;DATABASE=XXXXXX;UID=xxxxx;PWD=xxxxxxx;")
con.setdecoding(pyodbc.SQL_CHAR, encoding='utf-8')
con.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-8')
con.setdecoding(pyodbc.SQL_WMETADATA, encoding='utf-8')
con.setencoding(encoding='utf-8')
sql1 = """select * from table1"""
input1 = pd.read_sql_query(sql1,con)
student_name
0 abc
1 def
2 ghi
sql2 = """select * from table2"""
input2 = pd.read_sql_query(sql2,con)
subjects
0 maths
1 english
what i need is a cross join on table1 and table2 if i have to use sql it would be
select table1.*, table2.* from table1 cross join table2
student_name subjects
0 abc maths
0 abc english
1 def maths
1 def english
2 ghi maths
2 ghi english
i am trying to get above output using below line of code but it is throwing MemorryError. Any way of improving the code?
input1.assign(foo=1).merge(input2.assign(foo=1)).drop('foo',1)
Edited 3/28/2019: Since above one is not working i changed code to reduce columns. For this i converted dataframes to lists and looping on them to created output as a list. Buy the code is very slow. Any help to improve performance? I have a dataframe with key as first column and five columns with values between 0 and 1. Second dataframe have a different key as first column and 5 columns with 0 or 1 as values.
a = input5.values.tolist()
b = input6.values.tolist()
c = []
for i in range(len(a)):
for j in range(len(b)):
denominator = ((a[i][1]*float(b[j][1]))+(a[i][2]*float(b[j][2]))+(a[i]
[3]*float(b[j][3]))+(a[i][4]*float(b[j][4]))+(a[i][5]*float(b[j][5])))
numerator = (a[i][1]+a[i][2]+a[i][3]+a[i][4]+a[i][5])
mean = numerator/denominator
new_row = [a[i][0],b[j][0],mean]
c.append(new_row)
len(c)