SQL Server OUTPUT Clause 的数据血缘分析

SQL Server OUTPUT Clause 会对 SQL 语句的血缘分析产生影响,如果忽略对 OUTPUT Clause 的分析,那么将漏掉一些关键的数据血缘关系,从而影响数据血缘分析的准确性,进而影响组织的数据治理质量。

Gudu SQLFlow 可以对 SQL Server 中的 OUTPUT Clause 提供完整的数据血缘分析支持。

以下是 Microsoft SQL Server 官方文档对 OUTPUT Clause 的描述。从中我们可以知道,OUTPUT Clause 可以从 INSERT, UPDATE, DELETE, or MERGE 返回变动的的数据集,我们可以对这些返回的数据进行再加工,例如插入到其他目标表中,从而在不同的表中建立数据关联,形成数据血缘关系。

Returns information from, or expressions based on, each row affected by an INSERT, UPDATE, DELETE, or MERGE statement. These results can be returned to the processing application for use in such things as confirmation messages, archiving, and other such application requirements. The results can also be inserted into a table or table variable. Additionally, you can capture the results of an OUTPUT clause in a nested INSERT, UPDATE, DELETE, or MERGE statement, and insert those results into a target table or view.

下面我们以一个 SQL Server 存储过程为例,这个存储过程的主体部分是 Merge 语句。这个 merge 语句的作用是如果为新纪录,则插入到 dbo.Basel3, 如果是已有纪录,则进行日期的更新。但同时利用 OUTPUT Clause 和 外部的 Insert 语句,把这条更新的纪录再次插入到 dbo.Basel3, 只是 EffectiveFromDate 这个字段的值用当天的日期,以形成和在 merge 中直接插入那条纪录的差别。

CREATE PROCEDURE [dbo].[sampleProcedure] (@Period DATETIME)
AS
SET NOCOUNT ON;

INSERT INTO dbo.Basel3
(
	AccountNumber
	,PeriodKey
	,ExposureAmount
)
SELECT
	AccountNumber
	,PeriodKey
	,ExposureAmount
FROM
(
	MERGE INTO [dbo].[Basel3] AS MergeTarget
	USING
	(
		SELECT DISTINCT
			tmp.AccountNumber
			,tmp.PeriodKey
			,tmp.ExposureAmount
		FROM dbo.TmpBasel3 tmp (NOLOCK)
		LEFT JOIN dbo.Basel3 olb (NOLOCK)
			ON tmp.AccountNumber = olb.AccountNumber
				AND olb.CurrentIndicator = 1
		WHERE olb.Basel3Indicator <> tmp.Basel3Indicator	
	) AS MergeSource
		ON MergeTarget.[AccountNumber] = MergeSource.[AccountNumber]
			 AND MergeTarget.[CurrentIndicator] = 1
	WHEN NOT MATCHED
	THEN INSERT
	(	
		AccountNumber
		,PeriodKey
		,ExposureAmount
	)
	VALUES
	(
		MergeSource.AccountNumber
		,MergeSource.PeriodKey
		,MergeSource.ExposureAmount
	)
	WHEN MATCHED
	THEN UPDATE
	SET MergeTarget.[CurrentIndicator] = 0
		,MergeTarget.[EffectiveToDate] = GETDATE()
	OUTPUT $Action AS [ActionOut]	
		,MergeSource.AccountNumber
		,MergeSource.PeriodKey
		,MergeSource.ExposureAmount
	) AS MergeOut
 WHERE MergeOut.[ActionOut] = 'UPDATE'

Gudu SQLFlow 分析后,准确给出了该 SQL Server 存储过程的数据血缘。

我们可以看到 dbo.Basel3 这个表的数据不但有从 Merge Insert 语句插入的,也有通过 Merge OUTPUT Clause 返回,然后再通过 Insert 语句插入的。

可视化的结果如下:

SQL Server 存储过程中 merge output clause 的数据血缘