3

I have the following table in my AWS QLDB table:

INSERT INTO Testing << {
    'MyId': 1,
    'MyList': [ 'Item1', 'Item2', 'Item3']
},
{
    'MyId': 2,
    'MyList': [ 'Item2', 'Item3', 'Item4']
},
{
    'MyId': 3,
    'MyList': [ 'Item4', 'Item5', 'Item6']
}
>>

I need to be able to get all documents which contain a list of items (which do not come from another table), which I'm using the following query for:

SELECT *
FROM Testing AS t,
     t.MyList AS l
WHERE l IN ('Item1', 'Item2', 'Item4')

However, this gives the following output

+------+---------------------------+---------+
| MyId | MyList                    | _3      |
+------+---------------------------+---------+
| 3    | ["Item4","Item5","Item6"] | "Item4" |
+------+---------------------------+---------+
| 1    | ["Item1","Item2","Item3"] | "Item1" |
+------+---------------------------+---------+
| 1    | ["Item1","Item2","Item3"] | "Item2" |
+------+---------------------------+---------+
| 2    | ["Item2","Item3","Item4"] | "Item2" |
+------+---------------------------+---------+
| 2    | ["Item2","Item3","Item4"] | "Item4" |
+------+---------------------------+---------+

I want to be able to just get three distinct rows. It would appear based on the error I get when trying to use DISTINCT that QLDB doesn't support it, but I also would prefer using * in my SELECT, So I'm trying to use GROUP BY

SELECT *
FROM Testing AS t,
     t.MyList AS l
WHERE l IN ('Item1', 'Item2', 'Item4')
GROUP BY t.MyId

But this gives the following error:

Start query error
Semantic Error: at line , column : No such variable named '$__partiql__group_by_1_item_0'; No such variable named '$__partiql__group_by_1_item_0' (Service: AmazonQLDBSession; Status Code: 400; Error Code: BadRequestException; Request ID: 65vrQHytqHdEL3o9Ym9Xn4)

David
  • 4,744
  • 5
  • 33
  • 64

1 Answers1

4

Neither DISTINCT nor GROUP BY is not currently supported in QLDB, as indicated by omission from the SELECT reference. Unfortunately the error message is misleading in this case.

Assuming that your MyId column is unique itself, you can express what you want by filtering the lists by checking for inclusion of each item in the list, something like:

SELECT *
FROM Testing AS t
WHERE 'Item1' IN t.MyList OR 'Item2' IN t.MyList OR 'Item3' IN t.MyList

If you want to further filter the produced lists, you can add a nested SELECT with your original filter as described here:

SELECT t.MyId, (SELECT VALUE l FROM t.MyList WHERE l IN ('Item1', 'Item2', 'Item3'))
FROM Testing AS t
WHERE 'Item1' IN t.MyList OR 'Item2' IN t.MyList OR 'Item3' IN t.MyList

While this is a bit awkward, you likely wouldn't want to do a GROUP BY anyway, as that would group across the entire data set (requiring a complete aggregation), when the items for each t are contained inside of it.

Rob Marrowstone
  • 1,224
  • 8
  • 15