How to represent this query in Elixir's Ecto Query format?

Question

I'm having trouble "translating" the following SQL query into a valid Ecto query:

SELECT *
FROM file_modules 
WHERE file_id =
  (SELECT f.file_id
  FROM files AS f
  LEFT JOIN file_modules AS fm ON f.file_id = fm.file_id
  WHERE f.storage_id = '20:1:0:86d:1591:c89c:512:de52' AND fm.name = 'bruteforce'
  ORDER BY f.version DESC
  LIMIT 1)

I believe the way to go is to split this query into two. Here's what I've tried:

q1 = 
  File
  |> where([f], f.storage_id == ^storage_id)
  |> join(:left, [f], fm in FileModule, f.file_id == fm.file_id)
  |> where([..., fm], fm.name == ^module_name)
  |> order_by([..., fm], desc: fm.version)
  |> select([fm], fm.file_id)
  |> limit(1)

# q1 works and returns the expected file_id

q2 =
  q1
  |> subquery()
  |> where([fm, fm2], fm.file_id == fm2.file_id)  # Here's where I'm stuck
  |> preload([..., m], [modules: m])

q1 has the following Ecto.Query result:

#Ecto.Query<from f0 in Helix.Software.Model.File,
 left_join: f1 in Helix.Software.Model.FileModule, on: f0.file_id == f1.file_id,
 where: f0.storage_id == ^(storage_id),
 where: f1.name == ^:bruteforce, order_by: [desc: f1.version], limit: 1,
 select: f0.file_id>

My problem seems to be with q2. How should I format it in order to use the results from q1 as input?

Thanks in advance.

I believe this question is agnostic to my underlying schema, but here it is:

schema "files" do
  field :file_id, ID,
    primary_key: true

  field :name, :string
  field :path, :string
  field :software_type, Constant
  field :file_size, :integer
  field :storage_id, Storage.ID

  field :crypto_version, :integer

  field :full_path, :string

  belongs_to :type, SoftwareType,
    foreign_key: :software_type,
    references: :software_type,
    define_field: false
  belongs_to :storage, Storage,
    foreign_key: :storage_id,
    references: :storage_id,
    define_field: false

  has_many :modules, FileModule,
    foreign_key: :file_id,
    references: :file_id,
    on_replace: :delete

  timestamps()
end



schema "file_modules" do
  field :file_id, File.ID,
    primary_key: true
  field :name, Constant,
    primary_key: true
  field :version, :integer

  belongs_to :file, File,
    foreign_key: :file_id,
    references: :file_id,
    define_field: false,
    on_replace: :update
end

As mentioned by Jablanović on the comments, it's possible that the ORM abstraction fails sometimes. In that case, would it be possible to use the raw sql above while:

1) casting the File.ID and Storage.ID types, in order to avoid ugly string concatenation?

2) After returning the result, preloading or saving this result into the schema?

The interface I have in mind is something like:

q = "SELECT * FROM files WHERE file_id = $1", ^file_id
file = Repo.get(q) |> Repo.preload()

file.file_id  # Returns file id

As per the examples of Ecto.Adapters.SQL.query/4, it looks like I can achieve 1. How about 2?

My end goal would be to use the nested SQL query above, while casting file_id and storage_id safely, and using the %File{} schema correctly, with a loaded association at file.modules.

Note that a stored procedure or view would give me a better interface, which I could cast correctly using fragment, but I feel I'd still have problems with "preloading" that data into the schema.

Why don't you execute the two queries separately? You are not losing much. — Mladen Jablanović, Oct 09 '17 at 20:33
That's what I'm doing right now, as a workaround.. but without being able to work with subqueries, I end up being extremely limited. I'm sure Ecto supports, so it's a matter of me figuring out how it works. But so far, even after reading several articles and guides on Ecto subqueries specifically, I'm still having some trouble grasping it. — Renato Massaro, Oct 10 '17 at 12:00
Unfortunately, after working with multiple ORMs in the past, I can say that everyone sooner or later [hits the wall](https://stackoverflow.com/a/41429995/82592) trying to translate an SQL query into an ORM expression. Sometimes, keeping SQL as-is, either in a string, or in some form of database view, saves you and future maintainers lots of wasted hours. — Mladen Jablanović, Oct 11 '17 at 13:05
To be honest, I'm in the same camp and I prefer SQL over ORMs. In that case, is there any sane way to mix "raw" sql queries while having some type safety from Ecto's casting, plus being able to preload the returned data into the schema? (I've edited the question at the bottom) — Renato Massaro, Oct 11 '17 at 15:36

score 1 · Accepted Answer · answered Oct 12 '17 at 23:05

Subquerys are currently not allowed in where clauses; the documentation recommends using a JOIN instead.

I haven't tested it but the following query should produce the same results as your original SQL:

q1 = ... # Same as in the question.
q2 =
  FileModule
  |> join(:left, [fm], fid in subquery(q1), fm.file_id == fid)
  |> select([fm], fm)

score 0 · Answer 2 · answered Oct 12 '17 at 00:41

I didn't intend to answer my own question, but hey here's an alternative solution I've come up. As noted on the question, I'd be equally happy if I could find a nice interface that would let me:

use raw SQL queries
while still taking advantage of Ecto type casting
and being able to preload association data into the schema.

With a mix of Repo.load and Ecto.Adapters.SQL.query, I was able to do just that. In case it's helpful to someone else, I'll share the interface and the code.

sql = "
  SELECT *
  FROM file_modules 
  WHERE file_id =
    (SELECT f.file_id
    FROM files AS f
    LEFT JOIN file_modules AS fm ON f.file_id = fm.file_id
    WHERE f.storage_id = ##1::storage_id AND fm.name = ##2::module_name
    ORDER BY f.version DESC
    LIMIT 1)"

caster = fn type, value -> 
  case type do
    :storage_id ->
      Storage.ID.cast!(value) && to_string(value)
    :module_name ->
      Software.Module.exists?(value) && value || {:error, :bad_value}
    _ ->
      Hector.std_caster(type, value)
  end
end

query = Hector.query(sql, [storage_id, module_name], caster)

loader = fn repo, {columns, rows} ->
  rows
  |> Enum.map(fn row ->
    repo
    |> apply(:load, [File, {columns, row}])
    |> File.format()
  end)
end

{:ok, entries} = Hector.get(Repo, query, loader)

# Where `entries` is a list of %File{}, a valid Ecto Schema.

Hector is the library that handles this interface. The example above is the most complex case: one where we need a custom caster and a custom loader. For most cases, Hector's default loader will do just fine, and a custom caster is required any time a potentially unsafe input is present.

The code can be found here, with some extra examples on the tests.

Of course this is far from the best solution, and even though I may be able to figure out the correct Ecto syntax for this or that query, it's always handy to have a nice abstraction/interface for raw queries.

How to represent this query in Elixir's Ecto Query format?

2 Answers2