13

I have a Hibernate database with a single table that looks like:

PURCHASE_ID | PRODUCT_NAME | PURCHASE_DATE | PURCHASER_NAME | PRODUCT_CATEGORY
------------------------------------------------------------------------------
     1          Notebook      09-07-2018          Bob            Supplies
     2          Notebook      09-06-2018          Bob            Supplies
     3           Pencil       09-06-2018          Bob            Supplies
     4            Tape        09-10-2018          Bob            Supplies
     5           Pencil       09-09-2018         Steve           Supplies
     6           Pencil       09-06-2018         Steve           Supplies
     7           Pencil       09-08-2018         Allen           Supplies

And I want to return only the newest purchases, based on some other limitations. For example:

List<Purchase> getNewestPurchasesFor(Array<String> productNames, Array<String> purchaserNames) { ... }

Could be called using:

List<Purchase> purchases = getNewestPurchasesFor(["Notebook", "Pencil"], ["Bob", "Steve"]);

In English, "Give me the newest purchases, for either a Notebook or Pencil, by either Bob or Steve."

And would provide:

PURCHASE_ID | PRODUCT_NAME | PURCHASE_DATE | PURCHASER_NAME
-----------------------------------------------------------
     1          Notebook      09-07-2018          Bob            
     3           Pencil       09-06-2018          Bob            
     5           Pencil       09-09-2018         Steve           

So it's like a "distinct" lookup on multiple columns, or a "limit" based on some post-sorted combined-column unique key, but all the examples I've found show using the SELECT DISTINCT(PRODUCT_NAME, PURCHASER_NAME) to obtain those columns only, whereas I need to use the format:

from Purchases as entity where ...

So that the model types are returned with relationships intact.

Currently, my query returns me all of the old purchases as well:

PURCHASE_ID | PRODUCT_NAME | PURCHASE_DATE | PURCHASER_NAME | PRODUCT_CATEGORY
------------------------------------------------------------------------------
     1          Notebook      09-07-2018          Bob            Supplies
     2          Notebook      09-06-2018          Bob            Supplies
     3           Pencil       09-06-2018          Bob            Supplies
     5           Pencil       09-09-2018         Steve           Supplies
     6           Pencil       09-06-2018         Steve           Supplies

Which, for repeat purchases, causes quite the performance drop.

Are there any special keywords I should be using to accomplish this? Query languages and SQL-fu are not my strong suits.

Edit:

Note that I'm currently using the Criteria API, and would like to continue doing so.

Criteria criteria = session.createCriteria(Purchase.class);
criteria.addOrder(Order.desc("purchaseDate"));
// Product names
Criterion purchaseNameCriterion = Restrictions.or(productNames.stream().map(name -> Restrictions.eq("productName", name)).toArray(Criterion[]::new));
// Purchaser
Criterion purchaserCriterion = Restrictions.or(purchaserNames.stream().map(name -> Restrictions.eq("purchaser", name)).toArray(Criterion[]::new));
// Bundle the two together
criteria.add(Restrictions.and(purchaseNameCriterion, purchaserCriterion));

criteria.list(); // Gives the above results

If I try to use a distinct Projection, I get an error:

ProjectionList projections = Projections.projectionList();
projections.add(Projections.property("productName"));
projections.add(Projections.property("purchaser"));
criteria.setProjection(Projections.distinct(projections));

Results in:

17:08:39 ERROR Order by expression "THIS_.PURCHASE_DATE" must be in the result list in this case; SQL statement:

Because, as mentioned above, adding a projection/distinct column set seems to indicate to Hibernate that I want those columns as a result/return value, when what I want is to simply limit the returned model objects based on unique column values.

Craig Otis
  • 31,257
  • 32
  • 136
  • 234
  • so if "Bob" buys a new Pencil, the record 3 should be "replaced" by the new record in the query result, right? – Leviand Sep 17 '18 at 13:33
  • @Leviand Yes, exactly. If there were hypothetically an 8th row for `Bob/Pencil` that occurred on September 10th, it would be returned instead of the entry with ID 3. (Accurately answering the query, "Give me the newest purchases, for either a Notebook or Pencil, by either Bob or Steve.") – Craig Otis Sep 17 '18 at 13:34
  • do you already have a sql query that succeed in this? Or should I build that from scratch? – Leviand Sep 17 '18 at 13:37
  • Unfortunately I don't - and I'm looking to keep the query at the Hibernate level. (Criteria API if possible, Hibernate query otherwise.) The ideal result is not actually a database-level ResultSet, but rather the Hibernate-instantiated model objects with their relationships intact. (Relationships not shown in example above.) – Craig Otis Sep 17 '18 at 13:39
  • ok, I'm working on that – Leviand Sep 17 '18 at 13:41
  • I see an error in your question: `4 Pencil 09-09-2018 Steve` as result, but the data is `4 Tape 09-10-2018 Bob Supplies` , can you fix with the correct example? Thanks – Leviand Sep 17 '18 at 13:43
  • @Leviand Good eye! Thanks, and updated. – Craig Otis Sep 17 '18 at 14:00
  • Did you have a look at SO entries tagged with [greatest-n-per-group] tag in combination with hibernate? It seems to be a not so unusual question... – xwoker Sep 17 '18 at 14:20
  • @xwoker Not yet, but thank you - I didn't know that terminology. – Craig Otis Sep 17 '18 at 16:13

10 Answers10

1

First, use aggregation query to get last purchase date for product + purchaser combination.

Use that query as subselect matching the tuples:

from Puchases p 
where (p.PRODUCT_NAME, p1.PURCHASER_NAME, p1.PURCHASE_DATE) in
    (select PRODUCT_NAME, PURCHASER_NAME , max(PURCHASE_DATE) 
     from Purchases 
     where 
        PRODUCT_NAME in :productNames and 
        PURCHASER_NAME in :purchaserNames 
     group by PRODUCT_NAME, PURCHASER_NAME)

It should be possible to implement the same using criteria API as well, using Subqueries.propertiesIn.

See Hibernate Criteria Query for multiple columns with IN clause and a subselect

If your PURCHASE_ID's are guaranteed to be 'chronologically ascending', then you can simply use max(PURCHASE_ID) in subselect.

ckedar
  • 1,859
  • 4
  • 7
0

You can create temp variables and tables using @. i'm not sure how to create an array.

declare @product1 = 'Pencil'
declare @product2 = 'Notebook'
declare @purchaser_name1 = 'Bob'
declare @purchaser_name2= 'Steve'

This should get the most recent purchase date for each cust/prod combination

select 
product_name, purchaser_name, max(purchase_date) as max_purchase_date
into @temp
from purchases with(nolock) where 
product_name in (@product1,@product2) and
purchaser_name in (@purchaser_name1,@purchaser_name2)
group by product_name, purchaser_name

If you need circle back and get the id's, you can join back to the purchases to get them

select p.* from purchases p with(nolock) 
inner join @temp t 
on p.product_name = t.product_name
and p.purchaser_name = t.purchaser_name
and p.purchase_date = t.max_purchase_date

Note the "with(nolock)" after the table names. that may help with the performance.

Kevin D
  • 98
  • 8
  • Thanks @Kevin, I'm looking to avoid dropping all the way to variable declarations and lower level scripting, as I'm using Hibernate and would like to have model objects returned with relationships intact. – Craig Otis Sep 17 '18 at 13:16
0

Ok, as first I've build the query that extracts only the requested records:

select p1.* from Purchase p1

  join (
    select 
        max(PURCHASE_DATE) as maxdate, 
        purchaser_name, 
        PRODUCT_NAME from Purchase 
    where 
        (product_name ='Notebook' or product_name = 'Pencil') 
        and purchaser_name in ('Bob','Steve')
    group by 
        purchaser_name, 
        PRODUCT_NAME) p2

  on p1.PURCHASE_DATE = p2.maxDate
  and p1.PRODUCT_NAME = p2.PRODUCT_NAME
  and p1.PURCHASER_NAME = p2.PURCHASER_NAME;

which gave as output

PURCHASE_ID PRODUCT_NAME    PURCHASE_DATE             PURCHASER_NAME    PRODUCT_CATEGORY
1           Notebook        2018-07-09 00:00:00.000   Bob               Supplies
3           Pencil          2018-06-09 00:00:00.000   Bob               Supplies
5           Pencil          2018-09-09 00:00:00.000   Steve             Supplies

Now we can transform that query in a SQLQuery, and trasform that into a bean with .setResultTransformer(Transformers.aliasToBean(Purchase.class)). Note that I've named yourSession the session, change that accordingly:

List<Purchase> list = yourSession.createSQLQuery(
          "select p1.* from Purchase p1 "
        + " join ( "
        + "     select "
        + "         max(PURCHASE_DATE) as maxdate, "
        + "         purchaser_name, "
        + "         PRODUCT_NAME from Purchase "
        + "     where "
        + "         (product_name ='Notebook' or product_name = 'Pencil') " //this must be created dinamically based on your parameters
        + "         and purchaser_name in ('Bob','Steve') " //and this too
        + "     group by "
        + "         purchaser_name, "
        + "         PRODUCT_NAME) p2 "

        + " on p1.PURCHASE_DATE = p2.maxDate "
        + " and p1.PRODUCT_NAME = p2.PRODUCT_NAME "
        + " and p1.PURCHASER_NAME = p2.PURCHASER_NAME ")
        .setResultTransformer(Transformers.aliasToBean(Purchase.class))
        .list();

Ofc what is missing now is to pass your parameters like Notebook or Bob in the method where you will wrap this code into. I would build an helper method that writes the condition based on how big the parameters list is.

Since I've not an hibernate connected to a database I've freehanded the code, some fix could be required, but the general idea should do the trick.

Doing that without SQLQuery is, imho, much more difficult and harder to read: what you need is to keep intact the result into your bean, and this is what you will achieve with this.

Leviand
  • 2,745
  • 4
  • 29
  • 43
0

Update

To use hibernate Criteria, you can try subquery approach:

DetachedCriteria subQuery = DetachedCriteria.forClass(Purchase.class, "p2");

ProjectionList groupBy = Projections.projectionList();
groupBy.add(Projections.max("purchaseDate"));
groupBy.add(Projections.groupProperty("productName"));
groupBy.add(Projections.groupProperty("purchaserName"));
subQuery.setProjection(groupBy);

subQuery.add(Restrictions.in("productName", productNames));
subQuery.add(Restrictions.in("purchaserName", purchaserName));

Criteria purchase = session.createCriteria(Purchase.class, "p1");
purchase.add(Subqueries.propertiesIn(new String[] {"purchaseDate", "productName", "purchaserName"}, subQuery));
purchase.addOrder(Order.desc("purchaseDate"));

List<Purchase> p1 = purchase.list();

Another way is to use native SQL:

SELECT p1.*
FROM purchase p1 LEFT JOIN purchase p2
  ON (p1.purchaser_name = p2.purchaser_name 
      AND p1.product_name = p2.product_name 
      AND p1.purchase_date < p2.purchase_date)
WHERE p2.id IS NULL 
      AND p1.product_name IN ("Notebook", "Pencil") 
      AND p1.purchaser_name IN ("Bob", "Steve")
ORDER BY p1.product_name DESC

This SQL gives you siginificant performance advantage comparing to subquery approach.

However, it seems not translatable to Hibernate Criteria (since Criteria requires path / mapping between entities)

Mạnh Quyết Nguyễn
  • 17,677
  • 1
  • 23
  • 51
0

Try to use this code.

    SessionFactory sessFact = HibernateUtil.getSessionFactory();
    Session session = sessFact.openSession();
    Criteria criteria = session.createCriteria(Purchase.class);
    ProjectionList projList = Projections.projectionList();

    projList.add(Projections.groupProperty("purchaserName"));
    projList.add(Projections.groupProperty("productName"));
    projList.add(Projections.property("purchaseId"));
    projList.add(Projections.property("productName"));
    projList.add(Projections.max("purchaseDate"));
    projList.add(Projections.property("purchaserName"));

    criteria.setProjection(projList);

    List<String> productList = new ArrayList<String>() {
        {
            add("Notebook");
            add("Pencil");
        }
    };
    List<String> purchaserList = new ArrayList<String>() {
        {
            add("Bob");
            add("Steve");
        }
    };

    Disjunction prod = Restrictions.disjunction();
    prod.add(Restrictions.in("productName", productList));

    Disjunction purch = Restrictions.disjunction();
    purch.add(Restrictions.in("purchaserName", purchaserList));

    criteria.add(Restrictions.and(prod, purch));
    List resultList = criteria.list();

SQL as a result (<property name="show_sql">true</property>)

select this_.PURCHASER_NAME as y0_, this_.PRODUCT_NAME as y1_, this_.PURCHASE_ID as y2_, this_.PRODUCT_NAME as y3_, max(this_.PURCHASE_DATE) as y4_, this_.PURCHASER_NAME as y5_ from purchase this_ where ((this_.PRODUCT_NAME in (?, ?)) and (this_.PURCHASER_NAME in (?, ?))) group by this_.PURCHASER_NAME, this_.PRODUCT_NAME
Seymur Asadov
  • 612
  • 5
  • 19
0

In my opinion, the trick is to see that "give me the newest" is equivalent to "give the rows with no newer purchases". This translates to this kind of query:

-- This is SQL
-- Note that if two purchases have exactly the same date, this query will
-- return both; you can fine tune the condition inside the exists clause
-- to avoid this
select *
from purchases p1
where
p1.product_name in ('Notebook', 'Pencil') and
p1.purchaser_name in ('Bob', 'Steve') and
not exists (
   select p2.purchase_id
   from purchases p2
   where
   p2.product_name = p1.product_name and
   p2.purchaser_name = p1.purchaser_name and
   p2.purchase_date > p1.purchase_date
)
order by purchase_id;

Altough this is SQL, translating to HQL should be pretty straightforward, and that might be enough for you. It is been a long time since I used Hibernate Criteria (you tend to use the JPA API these days), but it should be something similar to this:

DetachedCriteria criteria = DetachedCriteria.forClass(Purchase.class, "p1");
// add here your filters to criteria
// criteria.add(purcharserName in (....));
// criteria.add(productName in (....));
// this appends the not exists clause
DetachedCriteria notExistsCriteria = DetachedCriteria.forClass(Purchase.class, "p2");
notExistsCriteria.add(Restrictions.eqProperty("p2.productName", "p1.productName"));
notExistsCriteria.add(Restrictions.eqProperty("p2.purchaserName", "p1.purchaserName"));
notExistsCriteria.add(Restrictions.gtProperty("p2.purchaseDate", "p1.purchaseDate"));

criteria.add(Subqueries.notExists(notExistsCriteria.setProjection(Projections.property("p1.id"))));

List<Purchase> results = // issue Criteria query

UPDATE:

I see that Hibernate Criteria has support for SQL ALL operator, so if your database supports it, you could also write this:

DetachedCriteria criteria = DetachedCriteria.forClass(Purchase.class, "p1");
// add here your filters to criteria
// criteria.add(purcharserName in (....));
// criteria.add(productName in (....));

// this appends the p1.purchaseDate > all (...) filter
DetachedCriteria allCriteria = DetachedCriteria.forClass(Purchase.class, "p2");
allCriteria.add(Restrictions.eqProperty("p2.productName", "p1.productName"));
allCriteria.add(Restrictions.eqProperty("p2.purchaserName", "p1.purchaserName"));

criteria.add(Subqueries.propertyGeAll("p1.purchaseDate", allCriteria.setProjection(Projections.property("p2.purchaseDate"))));

List<Purchase> results = // issue Criteria query

which reads a bit more clearly.

gpeche
  • 21,974
  • 5
  • 38
  • 51
0

Try the following HQL assuming that you have an auto incrementing id field.

FROM Purchase p WHERE p.id IN(SELECT MAX(p1.id) FROM Purchase p1 WHERE p1.productName IN('Notebook','Pencil') AND p1.purchaseName IN('Bob', 'Steve') GROUP BY p1.productName, p1.purchaseName)
AJA
  • 456
  • 3
  • 12
0

The solution is first to fetch the productName, purchaserName and max(purchaseDate) group by productName, purchaserName using detached criteria. It will result us to identify unique row using these three attributes. But, there is one catch here if same purchaser purchased same product more than one time on same day then we will not be able to identify unique row using above condition and it will result in multiple records getting fetched from DB. To resolve this, you need to use datetime or timestamp type for purchaseDate field in DB. Now, use these attributes from detached criteria in Criteria query to get the required result.

DetachedCriteria detachedCriteria = DetachedCriteria.forClass(Purchase.class, "inner");
    detachedCriteria.add(Restrictions.in("inner.productName", new String[] { "Notebook", "Pencil" }));
    detachedCriteria.add(Restrictions.in("inner.purchaserName", new String[] { "Bob", "Steve" }));
    detachedCriteria.setProjection(Projections.projectionList().add(Projections.max("inner.purchaseDate"))
            .add(Projections.groupProperty("inner.productName"))
            .add(Projections.groupProperty("inner.purchaserName")));
    Session session = this.getEntityManager().unwrap(Session.class);
    Criteria criteria = session.createCriteria(Purchase.class, "b");
    ProjectionList projectionList = Projections.projectionList();
    projectionList.add(Projections.property("b.purchaseId"));
    projectionList.add(Projections.property("b.productName"));
    projectionList.add(Projections.property("b.purchaseDate"));
    projectionList.add(Projections.property("b.purchaserName"));
    criteria.setProjection(projectionList);
    criteria.add(Subqueries.propertiesIn(new String[] { "b.purchaseDate", "b.productName", "b.purchaserName" },
            detachedCriteria));
    criteria.list();

This criteria query will fire below query in mysql

select this_.purchase_id as y0_, this_.product_name as y1_, this_.purchase_date as y2_, this_.purchaser_name as y3_ from purchase this_ where (this_.purchase_date, this_.product_name, this_.purchaser_name) in (select max(inner_.purchase_date) as y0_, inner_.product_name as y1_, inner_.purchaser_name as y2_ from purchase inner_ where inner_.product_name in (?, ?) and inner_.purchaser_name in (?, ?) group by inner_.product_name, inner_.purchaser_name)
Pavan
  • 121
  • 4
0
List<Purchase> findByProductNameInAndPurchaserNameInAndPurchaseDateBefore(List<String> productNames, List<String> purchaserNames, Date before);

I can not see Purchase entity, but of course if Product is mapped as entity this query must be updated accordingly.

NOTE: I would suggest you to read about full text search, or Hibernate search, because if you will have a lot of queries like this, it sounds like you will need full text search support in your project.

mommcilo
  • 956
  • 11
  • 28
-1

Hello I would offer you one very simple HQL based solution without too much magic around it. The solution is the following HQL query:

select p.id, max(p.date) from Purchase p where p.productName in('notebook','pencil') and p.purchaseName in ('ob', 'Steve') group by p.productName ,p.purchaseName

Once you have the Ids of the records you can select the Actual products by ID.

Now you would think that this query would return you the whole table. Not it will not. Most modern database that supports server side cursors will return you only this amount of records that you have specified.

Once you initialize the Query the next step is to tell it how many results you want it to return:

    Query query = query.setMaxResults(1)
    query.setFetchSize();
    query.scroll(ScrollMode.FORWARD_ONLY);
    // here is a hint for MySQL
    query.setMaxResults(100)

This used correctly this query will not return you the complete table! It will return as much as it is told.

Alexander Petrov
  • 9,204
  • 31
  • 70
  • Thanks, but I don't want to limit the number of results to a constant. I want to retrieve **all** of the values which match the query as described in my question. – Craig Otis Sep 17 '18 at 20:12
  • I think you don't understand here. This query is not limiting your results. This query is tellin to the backend except for the query.setMaxResults(100) that it wants to keep an open cursor and it is telling it how many records you want it to fetch at a time. – Alexander Petrov Sep 17 '18 at 20:13
  • If you remove the last line query.setMaxResults(100) which I have added only for MySQL. What will happen is thatif you have 10 000 records in the database and you start iterating on the collection returned by the query. For every FetchSize number of results it is going to scroll further. – Alexander Petrov Sep 17 '18 at 20:14
  • So if you don\t want to go to page 10 lets say you don't need to. Get it ? – Alexander Petrov Sep 17 '18 at 20:15
  • I don't think your query matches the requirements of the question. For example: If there are **two entries** in the table for Steve buying a pencil, your query will return both, sorted by the date. I only want the newest. – Craig Otis Sep 17 '18 at 20:15
  • @CraigOtis if you gave me the downvote thinking that it is returning the first 100 records I think you should retract it. I gave you half of the answer. If you figure out the SQL :) you will get all of it :) – Alexander Petrov Sep 17 '18 at 20:20
  • But I think we need to group the records in order to achieve it. – Alexander Petrov Sep 17 '18 at 20:21
  • select p.id, max(p.date) from Purchase p where p.productName in('notebook','pencil') and p.purchaseName in ('ob', 'Steve') group by p.productName ,p.purchaseName – Alexander Petrov Sep 17 '18 at 20:23