2

One of the nice things about Spark is the ability to use SQL expressions to query dataframes directly, via Spark SQL. However Spark is somewhat "heavy" as it's best suited for large datasets and distributed computing. In my particular case I have a very small dataset that I'm reading from a single CSV file into a Pandas dataframe. I would like to be able to write a SQL-like expression to query that Pandas dataframe. Does Pandas support this syntax/feature? If so, how is it used?

soapergem
  • 9,263
  • 18
  • 96
  • 152
  • 1
    Does this answer your question? [Executing an SQL query over a pandas dataset](https://stackoverflow.com/questions/45865608/executing-an-sql-query-over-a-pandas-dataset) – Dan Feb 14 '20 at 16:39
  • is the query quite complex? I'd re-write it into pandas for better readability – Umar.H Feb 14 '20 at 16:42
  • @Datanovice yes it is pretty complex. One of our data scientists wrote a very long query with many different joins that runs as part of a Spark job. But we need to mirror that functionality in a web service that operates on a single record (instead of a large batch), and I'm trying to see how much of his code I can re-use, hopefully without needing to rewrite too much. – soapergem Feb 14 '20 at 16:43

0 Answers0