0

I have two csv files with 30 to 40 thousands records each. I loaded the csv files into two corresponding dataframes. Now I want to perform this sql operation on the dataframes instead of in sqlite : update table1 set column1 = (select column1 from table2 where table1.Id == table2.Id), column2 = (select column2 from table2 where table1.Id == table2.Id) where column3 = 'some_value';

I tried to perform the update on dataframe in 4 steps: 1. merging dataframes on common Id 2. getting Ids from dataframe where column 3 has 'some_value' 3. filtering the dataframe of 1st step based on Ids received in 2nd step. 4. using lambda function to insert in dataframe where Id matches.

I just want to know other views on this approach and if there are any better solutions. One important thing is that the size of dataframe is quite large, so I feel like using sqlite will be better than pandas as it gives result in single query and is much faster. Shall I use sqlite or there are any better way to perform this operation on dataframe? Any views on this will be appreciated. Thank you.

  • right now its a broad question, please refer on [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) so we can replicate and find you answers,thanks..!! – anky Mar 30 '19 at 15:17
  • I dont have any code for this. Just want to know which approach will be better to perform update operation, sqlite or pandas ? @anky_91 – Preeti singh Mar 30 '19 at 15:31
  • yes I tried that. But data is large so I think its complicated to do merge and then combine as compared to sqlite. What do you think? @anky_91 – Preeti singh Mar 30 '19 at 15:38
  • yes, i generally prefer sqlite for sql operations inside a function using [pd.read_sql_query](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql_query.html) and to_sql – anky Mar 30 '19 at 15:39
  • Is pd.read_sql_query faster? @anky_91 – Preeti singh Mar 30 '19 at 15:41
  • i havent compared but works well for me – anky Mar 30 '19 at 15:42
  • Okay. In this case I'll have to load csv files in both pandas and sqlite. I need to use any one of them. So, I should go with sqlite, right? @anky_91 – Preeti singh Mar 30 '19 at 15:45
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/190964/discussion-between-anky-91-and-preeti-singh). – anky Mar 30 '19 at 16:34

0 Answers0