2

I have a table, let's call it Widget.

I do some complex processing to get various type of Widgets. These end up in two different variables.

To keep things simple, let's say we have...

widgetsA = Widget.where("blah blah blah")
widgetsB = Widget.where("blah blah blah blah")

We can still perform ActiveRecord functions like .where on widgetsA and widgetsB.

Now, after retrieving the sets for A and B, I need to union them, and then perform additional ActiveRecord functions on them.

I want to do something like this...

widgetsAll = widgetsA | widgetsB
widgetsAll = widgetsAll.order("RANDOM()")
widgetsAll = widgetsAll.where(answers_count: 0).limit(10) + widgetsAll.where("answers_count > 0").limit(10)

This will take all the widgets (union) found in A & B, randomize them, and select 10 with answers and 10 without answers.

The problem is, I cannot user .order and widgetsAll is no longer an ActiveRecord object, but it's an Array because of the widgetsAll = widgetsA | widgetsB line. How do I either

A) Union/Intersect two ActiveRecord sets, into an ActiveRecord set

B) How can I order and perform a 'where' style query on an Array.

Either will solve the issue. I assume B is a bit better for performance, so I suppose that would be the better answer.

Any ideas?

Lastly, lets say the Widget table has columns id, name, description. In the end we want an ActiveRecord or Array (likely preferred) of everything.

EDIT: (Attempting to combine via SQL UNION... but not working)

w1 = Widget.where("id = 1 OR id = 2")
w2 = Widget.where("id = 2 OR id = 3")
w3 = Widget.from("(#{w1.to_sql} UNION #{w2.to_sql})")

PG::SyntaxError: ERROR:  subquery in FROM must have an alias
LINE 1: SELECT "widgets".* FROM (SELECT "widgets".* FROM "widge...
chris P
  • 6,359
  • 11
  • 40
  • 84

3 Answers3

2

I see two options:

1) Do the union in SQL: Instead of widgetsA | widgetsB that return an array you can do an union in the database, so that the result is still a relation object:

Widget.from("(#{widgetA.to_sql} UNION #{widgetB.to_sql}) AS widgets")

2) Use normal array methods. Your example:

widgetsAll = widgetsAll.order("RANDOM()")
widgetsAll = widgetsAll.where(answers_count: 0).limit(10) + widgetsAll.where("answers_count > 0").limit(10)

would translate to something like this:

widgetsAll = widgetsAll.shuffle
widgetsAll = widgetsAll.select { |answer| widget.answer_count == 0 }.take(10) + 
             widgetsAll.select { |answer| widget.answers_count > 0).take(10)

Read more about Ruby arrays.

spickermann
  • 100,941
  • 9
  • 101
  • 131
  • Thanks, this looks like what I want. What do you think is better from a performance point of view? If I do 1, would it be more costly since there is going to be an extra query involved? Would it be better to do 2 with the array manipulation? Or are both likely to be the same, as the SQL is just a UNION anyway on already determined ActiveRecords? – chris P May 26 '15 at 22:38
  • Dont ever do 2. For 1, I'd use OR instead of UNION – apneadiving May 26 '15 at 22:41
  • The first version will be faster, because it does the union in the database, so it runs only on query intead of your example that runs at least two. I add the seconds version only because you asked for it. I would not recommend the second version, the database version is very likely always faster. – spickermann May 26 '15 at 22:42
  • @apneadiving: OR is different to UNION. In this case, when you combine two relation objects, you need to use UNION. – spickermann May 26 '15 at 22:44
  • Either option is fine @chrisP - some people here are recommending against options (like using the array methods) which are often orders of magnitude faster than doing an SQL query. Even those who are concerned about doing two queries instead of one are worried about queries that usually take fractions of a millisecond. Do whichever makes the most sense to you, they're all great solutions. – smathy May 26 '15 at 22:47
  • @smathy I insist they are not equivalent whenever you'd need to include pagination and ordering which is always the case unless you have hundreds of entries – apneadiving May 26 '15 at 22:52
  • Actually union vs or depends on db optimization hence structure. So i confirm you're right suggesting union – apneadiving May 26 '15 at 22:54
  • @smathy: It is not about miliiseconds. Imagine you have a table with millions of rows. It makes huge difference if you do just one query that returns 20 records and all the heavy lifting is done in the database. Or if you do 3 queries with millions of results and all operations are done in Ruby in memory. The database options ways faster, please do not recommend the array option. – spickermann May 26 '15 at 22:57
  • @spickermann Hey. I tried your example to UNION the SQL, but it's not working for me. I added an EDIT to the OP. Any ideas? – chris P May 26 '15 at 23:10
  • @chrisP: You missed the `AS widgets` part behind the UNION. – spickermann May 26 '15 at 23:14
  • @spickermann With OR and AS widgets.. I'm getting this error: "PG::SyntaxError: ERROR: syntax error at or near "SELECT" LINE 1: ...".* FROM "widgets" WHERE (id = 1 OR id = 2) OR SELECT "wi..." – chris P May 26 '15 at 23:17
  • @spickermann Also, how come there is no OR version of merge? Like w1.merge(w2)? – chris P May 26 '15 at 23:29
  • @chrisP: As I wrote before `OR` is not an option in this example, because `OR` works complete different than `UNION`. I am glad that I could help with my solution. – spickermann May 26 '15 at 23:39
1

Using the any_of gem you could do:

widgetsAll = Widget.where.any_of(widgetsA, widgetsB)
apneadiving
  • 114,565
  • 26
  • 219
  • 213
-1

One way would be to do as follows:

widgetsAllActual = Widget.where(id: widgetsAll)

This is creating a new Widget::ActiveRecord_Relation collection containing all the elements in widgetsA and widgetsB, and allows for making further active record scoping.

Ref: https://stackoverflow.com/a/24448317/429758

Community
  • 1
  • 1
Prakash Murthy
  • 12,923
  • 3
  • 46
  • 74
  • I edited this to remove the relatively slow `.map(&:id)` - ActiveRecord just does the right thing for you. – smathy May 26 '15 at 22:31
  • @smathy Do you know if this would be costly at all (since there is an additional query involved), as opposed to manipulating the already created Array object? Or not likely because the id's are going to be indexed anyway? – chris P May 26 '15 at 22:33
  • @prakash Do you know if this would be costly at all (since there is an additional query involved), as opposed to manipulating the already created Array object? Or not likely because the id's are going to be indexed anyway? – chris P May 26 '15 at 22:33
  • Its not a great solution: it makes several queries and would not allow you to do order and use pagination – apneadiving May 26 '15 at 22:33
  • That is false @apneadiving - you will end up with an `AR_Relation` in `widgetsAllActual` and so you will obviously still be able to do `order` or pagination or any operation that you'd perform on an AR scope. – smathy May 26 '15 at 22:37
  • Also, "several queries" is false too, by doing the above you will cause both the widgets A and B queries to be executed, so that's two additional queries on top of the queries he wants to run on `widgetsAllActual` – smathy May 26 '15 at 22:39
  • Wrong: except if you pass all ids from widgetsA and widgetsB queries to your new scope, which would be quite expensive, you depend on the first results of A and B which could not be orderred paginated in an absolute manner – apneadiving May 26 '15 at 22:39
  • 3 queries is terrible whenever you need 1. And again if you irder or paginate, the union might not make sense – apneadiving May 26 '15 at 22:40
  • You can see that this solution **is precisely** passing all the IDs from widgetsA and widgetsB, I don't think you've understood what the above query is doing. – smathy May 26 '15 at 22:42
  • Well I guess you precisely miss the point of db queries if you pass all ids from one query to another. Sorry – apneadiving May 26 '15 at 22:43
  • ...and I guess you've never heard of a sub-select if you don't ever do this. – smathy May 26 '15 at 23:10
  • Wow, you thought this was my code? I guess that explains the two downvotes of this poor guy's perfectly valid answer. – smathy May 26 '15 at 23:18