1

I have a table with the following contents in MySQL:

enter image description here

I am to query a DATETIME column called 'trade_time' with a where clause as follows:

SELECT * FROM tick_data.AAPL
WHERE trade_time between '2021-01-01 09:30:00' and '2021-01-01 16:00:00';

What I'm getting is a 2013 error: lost connection to MySQL server after about 30 seconds.

I'm pretty new to SQL so I'm pretty sure I might be doing something wrong here, surely such a simple query shouldn't take longer than 30 seconds?

The data has 298M rows, which is huge, I was under the impression that MySQL should handle this kind of operations.

The table has just 3 columns, which is trade_time, price and volume, I would just want to query data by dates and times in a reasonable time for further processing in Python.

Thanks for any advice.

EDIT: I've put up the timeout limit on MySQL Workbench to 5 minutes, the query described above took 291 seconds to run, just to get 1 day of data, is there some way I can speed up the performance?

orie
  • 541
  • 6
  • 20
  • The lost connection could be a client timeout (especially if you see it at almost exactly the same time every time). To improve the query performance for this specific query, make sure `trade_time` has an index. 298M rows is large, but MySQL (given the right hardware and indexing) can handle that query just fine. Also, if you expect a lot of rows to come back for that time, you may need to do some form of pagination with the query (limit/offset, ID ordering with limit, etc.) – Marc Baumbach May 13 '21 at 20:29
  • hi, thanks for the response. what kind of indexing could I do to improve the performance here? – orie May 13 '21 at 21:17

1 Answers1

1

298M rows is a lot to go through. I can definitely see that taking more than 30 seconds, but not much more. First, thing I would do is remove your default disconnection time limit. Personally I always make mine around 300 seconds or 5 min. If you're using mysql workbench that can be done via this method: MySQL Workbench: How to keep the connection alive

Also, I would try and check to see if the trade_time column has an index on it. Having your column that you query often indexed is a good strategy to make queries faster.

SHOW INDEX FROM tablename;

Look to see if trade_time is in the list. If not, you can create an index like so:

CREATE INDEX dateTime ON tablename (trade_time);

  • I don't see any indices on it, could you explain a bit about what that does, what kind of indexing does it uses? , the trade_time column by the is already 'datetime' in its type – orie May 13 '21 at 21:14
  • Sure thing, basically the important thing you need to know is that indexes store the column(s) you index in a sorted BTREE structure. The way indexed columns are structured allows for faster searching for and retrieving of specific things inside the column. They also allow the mysql optimizer to avoid performing Full Table Scans, which is slower especially the larger your table is. This comes at the cost of some storage space, and the inserts/updates/deletes will take a little longer as well. See https://dev.mysql.com/doc/refman/8.0/en/optimization-indexes.html – SenorCardgage May 13 '21 at 21:35
  • I'll look into it more in depth, sounds this is something worth investigating for me, thank you! – orie May 14 '21 at 12:36