0

If I run a sqoop query over RDBMS, say "select id,name from customer where country='US'". In this case sqoop will run the query in RDBMS and fetch only the filtered record which will then be split across to number of mappers we defined.

I want to understand how it will work if run the same query in spark over RDBMS? Filtration (country=US) will happen in RDBMS side or in spark memory? My understanding is filtration should happen in RDBMS but some articles say that filtration will happen in spark memory after fetching all records. Please clarify.

mazaneicha
  • 8,794
  • 4
  • 33
  • 52

0 Answers0