2

We try to use Full text search on Azure database and got performance problems on using CONTAINS search.

Our data has star schema, Fact table has clustered column store index enabled and around 40 million rows. Below is how we use CONTAINS on dimension and do aggregation on Fact table on different queries:

Query 1 using EXISTS:

SELECT f.[FK_DimCompanyCodeId], SUM(f.NetValueInUSD)
FROM [SPENDBY].[FactInvoiceDetail] f

WHERE EXISTS (
        SELECT * FROM [SPENDBY].[DimCompanyCode] d

        WHERE f.[FK_DimCompanyCodeId] = d.Id
        AND CONTAINS(d.*, 'Comcast'))

GROUP BY f.[FK_DimCompanyCodeId]

ORDER BY SUM(f.NetValueInUSD) DESC

This query seems run forever and never return the result.

There is non-clustered index on the foreign key FK_DimCompanyCodeId] and there is only one row returned when searching Comcast:

SELECT id  FROM [SPENDBY].[DimCompanyCode] d
WHERE CONTAINS(d.*, 'Comcast');
-- will return id = 5

And there are around 27 million rows of Fact table which has FK_DimCompanyCodeId = 5.

Query 2 using INNER JOIN:

SELECT f.[FK_DimCompanyCodeId], SUM(f.NetValueInUSD)
FROM [SPENDBY].[FactInvoiceDetail] f

INNER JOIN [SPENDBY].[DimCompanyCode] d ON (f.[FK_DimCompanyCodeId] = d.Id)
WHERE CONTAINS(d.*, 'Comcast')

GROUP BY f.[FK_DimCompanyCodeId]
ORDER BY SUM(f.NetValueInUSD) DESC

This query seems run forever and never return the result as well.

Query 3 using #temp table:

SELECT id INTO #temp FROM [SPENDBY].[DimCompanyCode] d
WHERE CONTAINS(d.*, 'Comcast');

SELECT f.[FK_DimCompanyCodeId], SUM(f.NetValueInUSD)
FROM [SPENDBY].[FactInvoiceDetail] f

WHERE EXISTS (
        SELECT * FROM #temp
        WHERE f.[FK_DimCompanyCodeId] = #temp.Id)

GROUP BY f.[FK_DimCompanyCodeId]

ORDER BY SUM(f.NetValueInUSD) DESC

Very fast, returns the result after 5 seconds.

Why full text search is so slow for in case 1 and case 2.

cuongle
  • 74,024
  • 28
  • 151
  • 206
  • Found related question: https://stackoverflow.com/questions/2750870/sql-serve-full-text-search-with-containstable-is-very-slow-when-used-in-join – cuongle Oct 08 '17 at 01:52
  • Are you able to add Actual Executions plans (the xml) from your queries? That would be genuinely useful. – wBob Oct 08 '17 at 13:38

2 Answers2

1

The problem is competing indexes -- one for the JOIN and one for the filter. Perhaps a subquery would convince SQL Server to use the text index first:

SELECT f.[FK_DimCompanyCodeId], SUM(f.NetValueInUSD)
FROM [SPENDBY].[FactInvoiceDetail] f JOIN
     (SELECT id
      FROM [SPENDBY].[DimCompanyCode] cc
      WHERE CONTAINS(cc.*, 'Comcast')
     ) cc
     ON cc.id = f.FK_DimCompanyCodeId
GROUP BY f.[FK_DimCompanyCodeId]
ORDER BY SUM(f.NetValueInUSD) DESC

It would probably also help if you have an index on FactInvoiceDetail(FK_DimCompanyCodeId).

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks for your answer, I tried running your query and it seems the result never returns as well – cuongle Oct 08 '17 at 00:35
  • Hi Cuong - can you share the estimated plans for the examples that ran indefinitely - and then the actual plan for the fast execution? – Joe Sack Oct 26 '17 at 23:47
  • @JoeSack: Please take a look my answer, I found out using `CONTAINSTABLE` gets the best performance – cuongle Jan 02 '18 at 15:37
1

Eventually, I figured out CONTAINS works well on specific column (Description for example):

SELECT f.[FK_DimCompanyCodeId], SUM(f.NetValueInUSD)
FROM [SPENDBY].[FactInvoiceDetail] f
WHERE  f.[FK_DimCompanyCodeId] IN  (
        SELECT d.Id FROM [SPENDBY].[DimCompanyCode] d
        WHERE CONTAINS(d.[Description], 'Comcast')
)
GROUP BY f.[FK_DimCompanyCodeId]
ORDER BY SUM(f.NetValueInUSD) DESC

In order to search for the whole table, CONTAINSTABLE will have the best performance and avoid using #temp table:

SELECT f.[FK_DimCompanyCodeId], SUM(f.NetValueInUSD)
FROM [SPENDBY].[FactInvoiceDetail] f
LEFT OUTER JOIN CONTAINSTABLE([SPENDBY].[DimCompanyCode], *, '"Comcast"') ct 
ON f.[FK_DimCompanyCodeId] = ct.[Key]
WHERE ct.[Key] IS NOT NULL
GROUP BY f.[FK_DimCompanyCodeId]
ORDER BY SUM(f.NetValueInUSD) DESC
cuongle
  • 74,024
  • 28
  • 151
  • 206