0

As of now, I am iterating through a pandas dataframe one by one and firing a select query to check whether the data is present or not, to the database based on few dataframe values.

 for index, frame in dataframe.iterrows():
     # select query based on multiple frame['column_1'],frame['column_2']

It would take forever if the data is huge. I was just curious, is there a way where I can fire all the select at once and filter out my dataframe for which I get a result back.?

user1896796
  • 731
  • 3
  • 9
  • 25
  • Can you bring the data in the database into memory? If so, you could do that and then use a join or anti-join to find if any rows present or missing. – RoachLord Oct 22 '18 at 07:25

2 Answers2

1
import pandas as pd
import numpy as np

columns = ["A", "B", "C"]

data = [
    [1,2,3],
    [1,1,1],
    [3,5,4],
    [2,5,3]
]

my_pandas = pd.DataFrame(data, columns=columns)

# Create a filter on the dataframe
my_pandas[my_pandas["A"] == 1]

# Create 2 filters (Don't forget the parenthesis. It's important)
my_pandas[(my_pandas["A"] == 1) & (my_pandas["C"] == 3)]

Result

    A   B   C
0   1   2   3
Kinnay
  • 31
  • 3
  • I could create a dataframe filtering out the columns I need for my select query. The question was, is there a way , I can fire multiple select queries at once to avoid my loop iterations. – user1896796 Oct 22 '18 at 07:56
  • Could you post an exemple ? I don't see what you mean. – Kinnay Oct 22 '18 at 11:03
0

Maybe this post helps you: how to transform pandas dataframe for insertion via executemany() statement?

MySQL, for example, has a executemany() that does pretty much exactly what I understand you want. Perhaps there is something like that using Pandas and numpy.. I haven't worked with those two yet

dennisdee
  • 160
  • 10