«

»

Nov 29

Data Type Results from UNION Queries

Sometimes when we are exporting data from one database to another we need to combine data sets at the source. One example of this is moving data in a data warehouse from a landing area where all of the tables are replicas of the source tables to a staging area where the tables are combined into a single data model.

One way to extract the data is through a view or stored procedure which uses UNION operators. If the source tables have different data types, the question then becomes what data types come out of the UNION query. This information is necessary so that we can build our target table appropriately.

When there are different data types for the same column between two or more UNION’d queries, the assignment of data type follows SQL Server Data Type Precedence. Data types at the top of this list have the highest precedence and data types at the bottom have the lowest. So, for example, if one column of a query is a NVARCHAR and the same corresponding column of a UNION’d query is a VARCHAR, the column will become an NVARCHAR since that data type has higher precedence.

Let’s look at a few examples of this. We’re going to create a temp table from the UNION query so that we can see which of the data types are selected. First we’ll create a UNION query that combines two base queries and has two columns which are both different data types.

USE Sandbox
GO

SELECT
 src.Col1
 ,src.Col2
INTO
 dbo.TestDataTypes
FROM
(
    SELECT
      CAST('1' AS NVARCHAR(10)) AS Col1
      ,CAST(2 AS INT) AS Col2
   UNION
   SELECT
      CAST(1 AS INT) AS Col1
      ,CAST(2 AS DATETIME) AS Col2
) AS src

Let’s look at the results of the script:

TestDataTypes Columns

The data types in the table are as we expected. INT is a higher precedence than NVARCHAR, so Col1 turns out to be type INT. However for Col2, DATETIME has a higher precedence than INT so that’s the data type that prevails.

One final question is what will happen when data lengths conflict. For example, what if we have a high precedence NVARCHAR of length 10 matched up with a lower precedence VARCHAR of length 20? Will the VARCHAR be truncated or will the lower precedence type win because of data length?

USE Sandbox
GO

SELECT
	src.Col3
INTO
	dbo.TestStringLengths
FROM
(
	SELECT
		CAST('teststring' AS NVARCHAR(10)) AS Col3
	UNION
	SELECT
		CAST('longerteststring' AS VARCHAR(20)) AS Col3
) AS src

TestStringLengths Columns

It looks like we get the best of both worlds. The higher precedence data type is selected, but the column is lengthened to accommodate the lower precedence data type.

In conclusion, SQL Server follows the data type precedence list when determining the data type of a column in a UNION query, but makes smart decisions about the data size to accommodate larger data fields with lower precedence.