1

I'm a last year college student and I'm doing my thesis right now. My title is "Index Suggestion based on Log Analysis". This project will analyze the PostgreSQL transaction log to give index recommendation to the database that will be tested.

This research will develop an index recommender tool by analyzing the attribute that is frequently accessed (using SELECT statement).

But, I found it's hard to find the PostgreSQL log file. My question is, where can I find PostgreSQL log transaction dataset? Or maybe other database log transaction dataset?

Super Kai - Kazuya Ito
  • 22,221
  • 10
  • 124
  • 129
wdwilhelmina
  • 39
  • 1
  • 7
  • But `SELECT` is not logged in the transaction log. I don't mean to put you down, but shouldn't you learn the basics before writing your thesis? – Laurenz Albe Feb 20 '19 at 07:18
  • You'd have to hook into the query optimizer somewhere, but it is not a simple task. – Laurenz Albe Feb 20 '19 at 07:22
  • Are you talking about the [write-ahead log](https://www.postgresql.org/docs/current/wal-intro.html), or the [server log](https://www.postgresql.org/docs/current/runtime-config-logging.html)? The WAL (a.k.a. the transaction log) doesn't record queries, but the server log will (if you tell it to). If you just want a record of `SELECT` statements, then [`pg_stat_statements`](https://www.postgresql.org/docs/current/pgstatstatements.html) is probably a better starting point, but parsing the queries accurately sounds like a lot more work than the actual analysis... – Nick Barnes Feb 20 '19 at 10:03
  • I'd start by reading all the existing StackOverflow questions about Postgres query-optimasation and/or performance. There are some patterns to be detected. – wildplasser Feb 20 '19 at 11:12
  • @NickBarnes I'm talking about write-ahead log. Yes, it's hard to parse the log file. I already try to parse the log using regular expression. What type of parsing that may be fit to parse the log file? – wdwilhelmina Feb 21 '19 at 02:40
  • @LaurenzAlbe I ever found SELECT statement in PostgreSQL log. But the SELECT statement doesn't appear anymore. – wdwilhelmina Feb 21 '19 at 02:41
  • @wildplasser thanks. do you know where can i get postgreSQL log file dataset that can be downloaded? – wdwilhelmina Feb 21 '19 at 02:43
  • Usethe optimisation cases from SO (or DBA@SO) as training material, and fetch the logfiles yourself. – wildplasser Feb 21 '19 at 08:51

3 Answers3

2

You are mixing up the transaction log (WAL) and the regular text log file.

The latter does contain statements (if the configuration is set like that), while the transaction log doesn't contain statements at all, just binary information about what has changed in which block.

You won't be able to recommend an index just from looking at the query, I can't do that either.

I have a suggestion for you: if you want to write a tool that suggests indexes, it should take the output of EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) SELECT /* your query */ as input.

Moreover, the tool will have to be connected to the database to query table and index metadata (and perhaps statistics). That makes you dependent on the database version, because metadata can change (and do – see partitioned tables), but that won't concern you so much in a thesis paper.

The task is still not simple (query optimization is AI), but then you have at least a chance.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • I'm a beginner in PostgreSQL. I've already set my log_statement: ON in postgresql.conf. but i still can't find the SELECT statement in my log. what configuration that I can do to mixing up the transaction log and the regular text log file? – wdwilhelmina Feb 27 '19 at 02:45
  • can you explain more about taking the output of EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) as input ? – wdwilhelmina Feb 27 '19 at 02:48
  • Sure you get the execution plan that way. – Laurenz Albe Feb 27 '19 at 04:17
0

You need to run the query below then restart PostgreSQL to enable logging persistently. *The parameter with ALTER SYSTEM SET is set to postgresql.auto.conf rather than postgresql.conf:

ALTER SYSTEM SET log_statement = 'all';

And, you need to run either of the queries below then restart PostgreSQL to disable logging persistently:

ALTER SYSTEM RESET log_statement;

Or:

ALTER SYSTEM SET log_statement = 'none';

You can also run the query below then need to restart PostgreSQL to enable logging persistently:

ALTER SYSTEM SET log_min_duration_statement = 0;

And, you can also run either of the queries below then need to restart PostgreSQL to disable logging persistently:

ALTER SYSTEM RESET log_min_duration_statement;

Or:

ALTER SYSTEM SET log_min_duration_statement = -1;

You can see my answer explaining more about how to enable and disable query logs on PostgreSQL.

Super Kai - Kazuya Ito
  • 22,221
  • 10
  • 124
  • 129
0

A bit late to the party here, but the thing you would probably want in practice is pg_stat_statements. Use it to list the queries with the highest total_exec_time, and look at their query plans. Then you would consider adding indexes that would speed up joins or scans in those queries.

This should be possible to automate to some extent. Similarly, recommending indexes to drop should be possible to do using index usage statistics. Personally, I'd love to have a tool that does this kind of suggestions automatically, and it would be a great example of profile guided optimization.

saolof
  • 1,097
  • 1
  • 15
  • 12