0

I have made an test table in sql with the following information schema as shown:

Database Schema and Table data

Now I extract this information using the python script the code of which is as shown:

import pandas as pd
import mysql.connector
 
db = mysql.connector.connect(host="localhost", user="root", passwd="abcdef")
pointer = db.cursor()
pointer.execute("use holdings")
x = "Select * FROM orders where tradingsymbol like 'TATACHEM'"
pointer.execute(x)
rows = pointer.fetchall()
rows = pd.DataFrame(rows)
stock = rows[1]

SQL TABLE Imported as Pandas DF

The production table contains 200 unique trading symbols and has the schema similar to the test table.

My doubt is that for the following statement:

x = "Select * FROM orders where tradingsymbol like 'TATACHEM'"

I will have to replace value of tradingsymbols 200 times which is ineffective.

Is there an effective way to do this?

Huzefa Sadikot
  • 561
  • 1
  • 7
  • 22
  • 1
    [Python MySQL parameterized queries](https://stackoverflow.com/q/775296/3890632) – khelwood Dec 08 '20 at 09:13
  • 1
    If you have a matching string "=" will give you better performance than LIKE but the above link explains it well what to do – Anna Semjén Dec 08 '20 at 09:23
  • Why does a matching string have better performance than LIKE – Huzefa Sadikot Dec 08 '20 at 09:28
  • 1
    LIKE accepts wildcard characters. This means that it needs to parse whatever you give it and thus creates a significant overhead if you do a direct comparison anyways – C Hecht Dec 08 '20 at 09:41
  • @AnnaSemjén - `x LIKE '...'` _without wildcards_ is turned into `x = '...'`. So, no performance diff. – Rick James Dec 08 '20 at 17:48
  • @Huzefa - You used the term "replace", yet I don't see an `UPDATE` statement. Please clarify what you are doing. – Rick James Dec 08 '20 at 17:50
  • @Rick James I am trying to filter rows as per the column Value. It is by trading symbol in this case. As @ C Hect said I want to avoid using multiple SQL queries for trading symbols as that is too ineffective and time consuming as better statements and procedures exist to get the same result – Huzefa Sadikot Dec 09 '20 at 01:47
  • @Huzefa - is `average_price` different for each day? Are you interested only in the _latest_ `average_price`? – Rick James Dec 09 '20 at 01:50
  • Yes the average price changes. But the close price is what is what I am interested in. The table values get updated every ten minutes so of I want to know what was the close price yesterday compare it with price today and perform further operation if the value was greater or lesser – Huzefa Sadikot Dec 09 '20 at 02:03

1 Answers1

1

If I understand you correctly, your problem is that you want to avoid sending multiple queries for each trading symbol, correct? In this case the following MySQL IN might be of help. You could then simply send one query to the database containing all tradingsymbols you want. If you want to do different things with the various trading symbols, you could select the subsets within pandas.

Another performance improvement could be pandas.read_sql since this speeds up the creation of the dataframe somewhat

Two more things to add for efficiency:

  • Ensure that tradingsymbols is indexed in MySQL for faster lookup processes
  • Make tradingsymbols an ENUM to ensure that no typos or alike are accepted. Otherwise the above-mentioned "IN" method also does not work since it does a full-text comparison.
Dharman
  • 30,962
  • 25
  • 85
  • 135
C Hecht
  • 932
  • 5
  • 14