EDIT - To be clear I will put the goal sentence at the top. The test and my question is whether there is a way to get the same performance as a temp table without using a temp table.
I feel this should be an easy question but I am stuck. I am experimenting with FileTables in SQL2014. I know of a few alternatives that would work well but the goal is to establish feasibility of extracting substrings of text from a filetable.
This test has 35,000 text files with one line of text as follows with each file having an average of 100 bytes of non-unicode text.
Aaa|Bbb|Ccc|Ddd|Eee|Fff|Ggg
The desired output is one row for each file and the delimited string to be split into seven columns.
I have found a quick string parser function, but running on the filestream has a significant performance impact compared to a varchar column.
This query takes 18 seconds to run. I am trying to have the conversion from filestream to varchar execute only once but I think calling the UDF may be causing it to happen for every row(file).
Create View vAddresses As
Select file_type, Convert(Varchar(8000),file_stream) TextData /* Into #Temp */ From InputFiles Where file_type = 'adr'
Go
Select --TextData,
dbo.udf_StringSplit(TextData, 1, '|'), dbo.udf_StringSplit(TextData, 2, '|'), dbo.udf_StringSplit(TextData, 3, '|'),
dbo.udf_StringSplit(TextData, 4, '|'), dbo.udf_StringSplit(TextData, 5, '|'), dbo.udf_StringSplit(TextData, 6, '|'),
dbo.udf_StringSplit(TextData, 7, '|')--, TextData
From vAddresses
I have tried it as a view, a cte and a subquery. The only thing that seems to help is creating a temp table. Creating the temp table takes 1 second and the query takes one second. So for 35k rows 2 seconds total query time vs. 18 seconds.
Drop Table #Temp
(Select file_type, Convert(Varchar(8000),file_stream) TextData Into #Temp From HumanaInputFiles Where file_type = 'adr')
Select --TextData,
dbo.udf_StringSplit(TextData, 1, '|'), dbo.udf_StringSplit(TextData, 2, '|'), dbo.udf_StringSplit(TextData, 3, '|'),
dbo.udf_StringSplit(TextData, 4, '|'), dbo.udf_StringSplit(TextData, 5, '|'), dbo.udf_StringSplit(TextData, 6, '|'),
dbo.udf_StringSplit(TextData, 7, '|')--, TextData
From #Temp
I have read many posts and blogs on both the topics of filetables and temp table vs single query performance but I cant seem to figure it out. It might have something to do with sargable or statistics? Any advice is much appreciated.
Here is the UDF, I found it on an MSDN blog / forum and is the best performer I have found so far.
ALTER FUNCTION [dbo].[udf_StringSplit](
@TEXT varchar(8000)
,@COLUMN tinyint
,@SEPARATOR char(1)
)RETURNS varchar(8000)
AS
BEGIN
DECLARE @POS_START int = 1
DECLARE @POS_END int = CHARINDEX(@SEPARATOR, @TEXT, @POS_START)
WHILE (@COLUMN >1 AND @POS_END> 0)
BEGIN
SET @POS_START = @POS_END + 1
SET @POS_END = CHARINDEX(@SEPARATOR, @TEXT, @POS_START)
SET @COLUMN = @COLUMN - 1
END
IF @COLUMN > 1 SET @POS_START = LEN(@TEXT) + 1
IF @POS_END = 0 SET @POS_END = LEN(@TEXT) + 1
RETURN SUBSTRING (@TEXT, @POS_START, @POS_END - @POS_START)
END
This is the execution plan for the temp table.
<?xml version="1.0" encoding="utf-16"?>
<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.2" Build="12.0.4100.1" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">
<BatchSequence>
<Batch>
<Statements>
<StmtSimple StatementCompId="1" StatementEstRows="17486" StatementId="1" StatementOptmLevel="TRIVIAL" CardinalityEstimationModelVersion="120" StatementSubTreeCost="0.166487" StatementText="Select --TextData,
 dbo.udf_StringSplit(TextData, 1, '|'), dbo.udf_StringSplit(TextData, 2, '|'), dbo.udf_StringSplit(TextData, 3, '|'),
 dbo.udf_StringSplit(TextData, 4, '|'), dbo.udf_StringSplit(TextData, 5, '|'), dbo.udf_StringSplit(TextData, 6, '|'),
 dbo.udf_StringSplit(TextData, 7, '|')--, TextData
 From #Temp" StatementType="SELECT" QueryHash="0xC4D6F0215D332F3D" QueryPlanHash="0xC50CFAF9494B5DBE" RetrievedFromCache="true">
<StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />
<QueryPlan DegreeOfParallelism="0" NonParallelPlanReason="CouldNotGenerateValidParallelPlan" CachedPlanSize="24" CompileTime="1" CompileCPU="1" CompileMemory="168">
<MemoryGrantInfo SerialRequiredMemory="0" SerialDesiredMemory="0" />
<OptimizerHardwareDependentProperties EstimatedAvailableMemoryGrant="838735" EstimatedPagesCached="419367" EstimatedAvailableDegreeOfParallelism="4" />
<RelOp AvgRowSize="28023" EstimateCPU="0.0017486" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="17486" LogicalOp="Compute Scalar" NodeId="0" Parallel="false" PhysicalOp="Compute Scalar" EstimatedTotalSubtreeCost="0.166487">
<OutputList>
<ColumnReference Column="Expr1003" />
<ColumnReference Column="Expr1004" />
<ColumnReference Column="Expr1005" />
<ColumnReference Column="Expr1006" />
<ColumnReference Column="Expr1007" />
<ColumnReference Column="Expr1008" />
<ColumnReference Column="Expr1009" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="17486" ActualEndOfScans="1" ActualExecutions="1" />
</RunTimeInformation>
<ComputeScalar>
<DefinedValues>
<DefinedValue>
<ColumnReference Column="Expr1003" />
<ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([tempdb].[dbo].[#Temp].[TextData],(1),'|')">
<UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[tempdb]" Schema="[dbo]" Table="[#Temp]" Column="TextData" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(1)" />
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="'|'" />
</ScalarOperator>
</UserDefinedFunction>
</ScalarOperator>
</DefinedValue>
<DefinedValue>
<ColumnReference Column="Expr1004" />
<ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([tempdb].[dbo].[#Temp].[TextData],(2),'|')">
<UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[tempdb]" Schema="[dbo]" Table="[#Temp]" Column="TextData" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(2)" />
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="'|'" />
</ScalarOperator>
</UserDefinedFunction>
</ScalarOperator>
</DefinedValue>
<DefinedValue>
<ColumnReference Column="Expr1005" />
<ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([tempdb].[dbo].[#Temp].[TextData],(3),'|')">
<UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[tempdb]" Schema="[dbo]" Table="[#Temp]" Column="TextData" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(3)" />
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="'|'" />
</ScalarOperator>
</UserDefinedFunction>
</ScalarOperator>
</DefinedValue>
<DefinedValue>
<ColumnReference Column="Expr1006" />
<ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([tempdb].[dbo].[#Temp].[TextData],(4),'|')">
<UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[tempdb]" Schema="[dbo]" Table="[#Temp]" Column="TextData" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(4)" />
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="'|'" />
</ScalarOperator>
</UserDefinedFunction>
</ScalarOperator>
</DefinedValue>
<DefinedValue>
<ColumnReference Column="Expr1007" />
<ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([tempdb].[dbo].[#Temp].[TextData],(5),'|')">
<UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[tempdb]" Schema="[dbo]" Table="[#Temp]" Column="TextData" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(5)" />
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="'|'" />
</ScalarOperator>
</UserDefinedFunction>
</ScalarOperator>
</DefinedValue>
<DefinedValue>
<ColumnReference Column="Expr1008" />
<ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([tempdb].[dbo].[#Temp].[TextData],(6),'|')">
<UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[tempdb]" Schema="[dbo]" Table="[#Temp]" Column="TextData" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(6)" />
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="'|'" />
</ScalarOperator>
</UserDefinedFunction>
</ScalarOperator>
</DefinedValue>
<DefinedValue>
<ColumnReference Column="Expr1009" />
<ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([tempdb].[dbo].[#Temp].[TextData],(7),'|')">
<UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[tempdb]" Schema="[dbo]" Table="[#Temp]" Column="TextData" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(7)" />
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="'|'" />
</ScalarOperator>
</UserDefinedFunction>
</ScalarOperator>
</DefinedValue>
</DefinedValues>
<RelOp AvgRowSize="4011" EstimateCPU="0.0193131" EstimateIO="0.145426" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="17486" LogicalOp="Table Scan" NodeId="1" Parallel="false" PhysicalOp="Table Scan" EstimatedTotalSubtreeCost="0.164739" TableCardinality="17486">
<OutputList>
<ColumnReference Database="[tempdb]" Schema="[dbo]" Table="[#Temp]" Column="TextData" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="17486" ActualEndOfScans="1" ActualExecutions="1" />
</RunTimeInformation>
<TableScan Ordered="false" ForcedIndex="false" ForceScan="false" NoExpandHint="false" Storage="RowStore">
<DefinedValues>
<DefinedValue>
<ColumnReference Database="[tempdb]" Schema="[dbo]" Table="[#Temp]" Column="TextData" />
</DefinedValue>
</DefinedValues>
<Object Database="[tempdb]" Schema="[dbo]" Table="[#Temp]" IndexKind="Heap" Storage="RowStore" />
</TableScan>
</RelOp>
</ComputeScalar>
</RelOp>
</QueryPlan>
</StmtSimple>
</Statements>
</Batch>
</BatchSequence>
</ShowPlanXML>
This is the plan for the view.
<?xml version="1.0" encoding="utf-16"?>
<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.2" Build="12.0.4100.1" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">
<BatchSequence>
<Batch>
<Statements>
<StmtSimple StatementCompId="1" StatementEstRows="17486" StatementId="1" StatementOptmLevel="FULL" StatementOptmEarlyAbortReason="GoodEnoughPlanFound" CardinalityEstimationModelVersion="120" StatementSubTreeCost="0.905265" StatementText="Select --TextData,
 dbo.udf_StringSplit(TextData, 1, '|'), dbo.udf_StringSplit(TextData, 2, '|'), dbo.udf_StringSplit(TextData, 3, '|'),
 dbo.udf_StringSplit(TextData, 4, '|'), dbo.udf_StringSplit(TextData, 5, '|'), dbo.udf_StringSplit(TextData, 6, '|'),
 dbo.udf_StringSplit(TextData, 7, '|')--, TextData
 From vAddresses" StatementType="SELECT" QueryHash="0xB4F8A0B288802C4E" QueryPlanHash="0x28DA02D774B1AF53" RetrievedFromCache="true">
<StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />
<QueryPlan DegreeOfParallelism="0" NonParallelPlanReason="CouldNotGenerateValidParallelPlan" CachedPlanSize="32" CompileTime="3" CompileCPU="3" CompileMemory="520">
<Warnings>
<PlanAffectingConvert ConvertIssue="Cardinality Estimate" Expression="CONVERT(varchar(8000),[DmProd01].[dbo].[HumanaInputFiles].[file_stream],0)" />
</Warnings>
<MemoryGrantInfo SerialRequiredMemory="0" SerialDesiredMemory="0" />
<OptimizerHardwareDependentProperties EstimatedAvailableMemoryGrant="838735" EstimatedPagesCached="419367" EstimatedAvailableDegreeOfParallelism="4" />
<RelOp AvgRowSize="28023" EstimateCPU="0.0017486" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="17486" LogicalOp="Compute Scalar" NodeId="0" Parallel="false" PhysicalOp="Compute Scalar" EstimatedTotalSubtreeCost="0.905265">
<OutputList>
<ColumnReference Column="Expr1004" />
<ColumnReference Column="Expr1005" />
<ColumnReference Column="Expr1006" />
<ColumnReference Column="Expr1007" />
<ColumnReference Column="Expr1008" />
<ColumnReference Column="Expr1009" />
<ColumnReference Column="Expr1010" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="17486" ActualEndOfScans="1" ActualExecutions="1" />
</RunTimeInformation>
<ComputeScalar>
<DefinedValues>
<DefinedValue>
<ColumnReference Column="Expr1004" />
<ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([Expr1011],(1),'|')">
<UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
<ScalarOperator>
<Identifier>
<ColumnReference Column="Expr1011" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(1)" />
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="'|'" />
</ScalarOperator>
</UserDefinedFunction>
</ScalarOperator>
</DefinedValue>
<DefinedValue>
<ColumnReference Column="Expr1005" />
<ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([Expr1011],(2),'|')">
<UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
<ScalarOperator>
<Identifier>
<ColumnReference Column="Expr1011" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(2)" />
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="'|'" />
</ScalarOperator>
</UserDefinedFunction>
</ScalarOperator>
</DefinedValue>
<DefinedValue>
<ColumnReference Column="Expr1006" />
<ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([Expr1011],(3),'|')">
<UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
<ScalarOperator>
<Identifier>
<ColumnReference Column="Expr1011" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(3)" />
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="'|'" />
</ScalarOperator>
</UserDefinedFunction>
</ScalarOperator>
</DefinedValue>
<DefinedValue>
<ColumnReference Column="Expr1007" />
<ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([Expr1011],(4),'|')">
<UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
<ScalarOperator>
<Identifier>
<ColumnReference Column="Expr1011" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(4)" />
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="'|'" />
</ScalarOperator>
</UserDefinedFunction>
</ScalarOperator>
</DefinedValue>
<DefinedValue>
<ColumnReference Column="Expr1008" />
<ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([Expr1011],(5),'|')">
<UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
<ScalarOperator>
<Identifier>
<ColumnReference Column="Expr1011" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(5)" />
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="'|'" />
</ScalarOperator>
</UserDefinedFunction>
</ScalarOperator>
</DefinedValue>
<DefinedValue>
<ColumnReference Column="Expr1009" />
<ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([Expr1011],(6),'|')">
<UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
<ScalarOperator>
<Identifier>
<ColumnReference Column="Expr1011" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(6)" />
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="'|'" />
</ScalarOperator>
</UserDefinedFunction>
</ScalarOperator>
</DefinedValue>
<DefinedValue>
<ColumnReference Column="Expr1010" />
<ScalarOperator ScalarString="[DmProd01].[dbo].[udf_StringSplit]([Expr1011],(7),'|')">
<UserDefinedFunction FunctionName="[DmProd01].[dbo].[udf_StringSplit]">
<ScalarOperator>
<Identifier>
<ColumnReference Column="Expr1011" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(7)" />
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="'|'" />
</ScalarOperator>
</UserDefinedFunction>
</ScalarOperator>
</DefinedValue>
</DefinedValues>
<RelOp AvgRowSize="4019" EstimateCPU="0.0034972" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="17486" LogicalOp="Compute Scalar" NodeId="1" Parallel="false" PhysicalOp="Compute Scalar" EstimatedTotalSubtreeCost="0.88673">
<OutputList>
<ColumnReference Column="Expr1011" />
</OutputList>
<ComputeScalar>
<DefinedValues>
<DefinedValue>
<ColumnReference Column="Expr1011" />
<ScalarOperator ScalarString="CONVERT(varchar(8000),[DmProd01].[dbo].[HumanaInputFiles].[file_stream],0)">
<Convert DataType="varchar" Length="8000" Style="0" Implicit="false">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[DmProd01]" Schema="[dbo]" Table="[HumanaInputFiles]" Column="file_stream" />
</Identifier>
</ScalarOperator>
</Convert>
</ScalarOperator>
</DefinedValue>
</DefinedValues>
<RelOp AvgRowSize="4043" EstimateCPU="0.0386262" EstimateIO="0.844606" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="17486" LogicalOp="Table Scan" NodeId="2" Parallel="false" PhysicalOp="Table Scan" EstimatedTotalSubtreeCost="0.883233" TableCardinality="34972">
<OutputList>
<ColumnReference Database="[DmProd01]" Schema="[dbo]" Table="[HumanaInputFiles]" Column="file_stream" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="17486" ActualEndOfScans="1" ActualExecutions="1" />
</RunTimeInformation>
<TableScan Ordered="false" ForcedIndex="false" ForceScan="false" NoExpandHint="false" Storage="RowStore">
<DefinedValues>
<DefinedValue>
<ColumnReference Database="[DmProd01]" Schema="[dbo]" Table="[HumanaInputFiles]" Column="file_stream" />
</DefinedValue>
</DefinedValues>
<Object Database="[DmProd01]" Schema="[dbo]" Table="[HumanaInputFiles]" IndexKind="Heap" Storage="RowStore" />
<Predicate>
<ScalarOperator ScalarString="[DmProd01].[dbo].[HumanaInputFiles].[file_type]=N'adr'">
<Compare CompareOp="EQ">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[DmProd01]" Schema="[dbo]" Table="[HumanaInputFiles]" Column="file_type" ComputedColumn="true" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="N'adr'" />
</ScalarOperator>
</Compare>
</ScalarOperator>
</Predicate>
</TableScan>
</RelOp>
</ComputeScalar>
</RelOp>
</ComputeScalar>
</RelOp>
</QueryPlan>
</StmtSimple>
</Statements>
</Batch>
</BatchSequence>
</ShowPlanXML>
EDIT: Searched on this differently and found an answer is to use top and order by. This got it down to 4 seconds. Seems kind of hokey and still dont explain how looking at query plans helps figure this out, so not going to answer this myself, instead leave it open.