I have two csv files with 30 to 40 thousands records each. I loaded the csv files into two corresponding dataframes. Now I want to perform this sql operation on the dataframes instead of in sqlite : update table1 set column1 = (select column1 from table2 where table1.Id == table2.Id), column2 = (select column2 from table2 where table1.Id == table2.Id) where column3 = 'some_value';
I tried to perform the update on dataframe in 4 steps: 1. merging dataframes on common Id 2. getting Ids from dataframe where column 3 has 'some_value' 3. filtering the dataframe of 1st step based on Ids received in 2nd step. 4. using lambda function to insert in dataframe where Id matches.
I just want to know other views on this approach and if there are any better solutions. One important thing is that the size of dataframe is quite large, so I feel like using sqlite will be better than pandas as it gives result in single query and is much faster. Shall I use sqlite or there are any better way to perform this operation on dataframe? Any views on this will be appreciated. Thank you.