0

I have a model Foo with a String bar and a String name. Some records' bar contain the name of other records in them. This is intentional.

I want to find the "root Foo" records - that is, the ones where their name do not appear in the bar records of any other Foo records.

Example:

Foo
id: 1
name: 'foo1'
bar:  'something something'

id: 2
name: 'foo2'
bar:  'foo1 something'

id: 3
name: 'foo3'
bar:  'foo1, foo4'

My method root_foos would return foo2 and foo3 since their names do not appear in any bar string.

edit: I don't want to use a relation or foreign key here - just this method.

charlie
  • 181
  • 13
  • I think you need to restructure your database and make a self many-to-many relationship for Foo (Foo has and belongs to many Foos). That way you can achieve what you want in just two queries maybe. As it stands, it's an extremely inefficient thing to do `root_foos`. – Tamer Shlash May 31 '14 at 03:18

2 Answers2

2
SELECT f.*
FROM   foo f
WHERE  NOT EXISTS (
   SELECT 1
   FROM   foo f2
   WHERE  f.name <@ string_to_array(f2.bar, ', ')
   );

Replace ', ' with the delimiter you actually use.
This would be much faster with an index. You can build a functional GIN index to work with your current unfortunate design:

CREATE INDEX foo_bar_arr_gin_idx ON foo USING GIN (string_to_array(bar, ', '));

bar should really be implemented as array instead of a string. Then you can simplify both query and index:
Table indexes for Text[] array columns
Why isn't my PostgreSQL array index getting used (Rails 4)?

Normalize schema

As mentioned in the linked answer, the database schema would be much cleaner (queries and indexing simpler and faster) if you implement the n:m relationship properly. Performance depends on many variables.

CREATE TABLE foo
   foo_id serial PRIMARY KEY
 , foo    text
);

CREATE TABLE foo_foo
   foo_id1 int REFERENCES foo
 , foo_id2 int REFERENCES foo
 , PRIMARY KEY (foo_id1, foo_id2) -- provides necessary index automatically
);

Then your data would look like this:

foo
foo_id: 1
foo: 'foo1'

foo_id: 2
foo: 'foo2'

foo_id: 3
foo:   'foo3'


foo_foo
foo_id1: 1
foo_id2: some_id

foo_id1: 1
foo_id2: some_other_id

foo_id1: 2
foo_id2: 1

foo_id1: 2
foo_id2: some_id

foo_id1: 3
foo_id2: 1

foo_id1: 3
foo_id2: 4

And the query:

SELECT f.*
FROM   foo f
WHERE  NOT EXISTS (
   SELECT 1
   FROM   foo_foo f2
   WHERE  f2.foo_id2 = f.foo_id
   );
Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • 1
    And to fill in ActiveRecord bits: wrap that query in `Foo.find_by_sql(%q{ ... })` to get the model instances. And if the `bar` columns aren't free form text then, yeah, that schema needs to be beaten with a stick. – mu is too short May 31 '14 at 03:28
  • @muistooshort it's not a standard schema because it's the sort of thing relying heavily on an OO model - ie. maybe this should have been using mongodb instead... – charlie May 31 '14 at 08:35
  • 2
    @charlie: By "relying heavily on an OO model" do you mean the usual "no logic in the database and do everything in Ruby" nonsense that is so prevalent in the Rails world? BTW, PostgreSQL understands array columns, Rails4 supports then natively, and the `postgres_ext` gem lets you use them with Rails3. My experience is that MongoDB causes a lot more work than it saves. – mu is too short May 31 '14 at 17:05
  • I love Ruby because it's got such great support for OO principles, much like my other favourite Smalltalk - it's such a shame that it's popular to mutilate objects to fit them into relational databases. I'm not using this column as an array, or any other 'standard' usage. Thanks for letting me know the other resources, but they don't actually fit what I'm doing here. I've also rethought my strategy and am using a tree parser instead. – charlie Jun 01 '14 at 01:01
  • I don't think there's any shame in using some logic in ruby if it's OO - e.g what a relational DB just can't do. That's what I'm doing here. I appreciate the help though. – charlie Jun 01 '14 at 01:03
-1

This is going to be horrifically slow with any number of records, and I'd highly recommend restructuring your schema if this is a query you need with any frequency but:

objs = Model.all.to_a
objs.select { |obj| !objs.any? { |inner_obj| inner_obj[:bar].index(obj[:name]) } }
ctide
  • 5,217
  • 1
  • 29
  • 26