2

Hello and thank you for reading my question!

Currently, we use PostgreSQL v.10 on 3 nodes through stolon (https://github.com/sorintlab/stolon) We have 3 tables (I want to make my question simple):

  1. Invoice (150 000 000 records)
  2. User (35 000 000 records)
  3. User_Address (20 000 000 records)

The main query looks like this (The original query is a large, using a temporary table and have a lot of where conditions, but the sample shows my problem.)

select 
    i.* 
from invoice as i 
inner join get_similar_name('Jon') as s on i.name ilike s.name 
left join user_address as a on i.user_id = a.user_id
where 
    a.state = 'OH'
    and
    i.last_name = 'Smith'
    and
    i.date between '2016-01-01'::date and '2018-12-31'::date;

The function get_similar_name returns similar names (example: get_similar_name('Jon') will return John, Jonny, Jonathan ... etc) average 200 - 1000 names. I must use the function :\

The query was executed a long time, around 30 - 120 seconds, but if I exclude the function get_similar_name from the query, then execution time will be not more then 1 second.

I already configured PostgreSQL and the server working pretty good. I also created indexes and my query don't use seq scan and etc.

We don't have the possibility to make partitioned tables because we have a lot of columns for this. We can't divide a table only by one row.

I think about migrating my warehouse to MongoDB

My questions are:

  1. Am I right about moving to MongoDB?
  2. Increase performance if I move warehouse from PostgreSQL to 20-40 nodes under MongoDB control?
  3. Is it possible to have the function get_similar_name on MongoDB or similar solution? If yes, how?
  4. Do you have good experience to use fulltext search in MongoDB?
  5. Is it right way to use MongoDB on production?
  6. Can you please advise a "google-vector" to right solution on your opinion?
kalloc
  • 61
  • 1
  • 4
  • The question is way to broad, but can probably be best summed up by *"Simply using another Database Engine will provide little if any benifits, from that action **alone**"*. Moving from SQL to NoSQL is not just a matter of moving over your data. The whole point is redesigning the data, and of course understanding when/if that was really the best thing to do. I suggest you start with [MongoDB relationships: embed or reference?](https://stackoverflow.com/questions/5373198/mongodb-relationships-embed-or-reference) to at least understand why this is not a simple "migration". – Neil Lunn Nov 22 '18 at 20:34
  • Also the other huge problem in the question is that it is actually "multiple questions". Whilst I sympathize with the eagerness, instead of asking "one big question" break your questions down into separate smaller issues and ask each one of those instead. See [How do I ask a good question?](https://stackoverflow.com/help/how-to-ask) and [What types of questions should I avoid asking?](https://stackoverflow.com/help/dont-ask) for some guidance to how to post your questions here. – Neil Lunn Nov 22 '18 at 20:38
  • You are joining only by name ? Can't someone share the same name and be a different person ? Another thing, the call to `get_similar_name()` it will be only 3 letters or that can change? – David Magalhães Nov 22 '18 at 21:22
  • *Am I right about moving to MongoDB?* Did you examine your hardware configuration? Because if you're storing your data on commodity (read S-L-O-W) hardware such as 5400-rpm consumer-grade SATA disks from Newegg, no. You should simply buy faster hardware. Because all the fastest software in the world can't run through gigabytes and gigabytes of data quickly if it's on slow hardware. – Andrew Henle Nov 23 '18 at 06:22

1 Answers1

1

I don't know if moving to MongoDB will solve a text search problem, but Postgres has excellent features like Vector and trigram. Have you tired any of this?

https://www.compose.com/articles/mastering-postgresql-tools-full-text-search-and-phrase-search/

https://www.postgresql.org/docs/9.6/pgtrgm.html

On my previous project, we used pg_trgm and were pretty happy with its performance.

Naren
  • 56
  • 4