«

»

Oct 27

Correcting Excel Data Types in the Import Wizard

A common task that business intelligence professionals have to perform is to import data from Microsoft Excel into SQL Server.  This is usually a unnecessarily challenging project with many pitfalls.  One of the pitfalls is getting the SQL Import Export Wizard to correctly determine the data types.

One pattern that you occasionally see in data files is a column which is a sequential list of whole numbers, followed at the bottom with some strings.  This pattern can form when the data file was created in sorted order which will put all the alphanumeric strings at the bottom.

010 - Data List 1

020 - Data List 2

The way that the SQL Import Export Wizard determines data types of the Excel columns is to take a sample of the values.  Unfortunately, this sample is taken from a limited number of rows in the worksheet.  In this sample data file you can see that there are approximately 100 rows of numeric data followed by a string in the NaturalKey column.  The Import Wizard identifies both of the input columns as floats, which is obviously wrong.
030 - Incorrect Column Mapping 1

Now we have the clever idea of moving the record with the string up to the top of the file.  This is so that the Import Wizard will see the string in the sample of rows it takes.

040 - String at top

However, now the Import Wizard decides that there aren’t enough values of type Varchar to set the data type to Varchar.  Instead it only imports the data that matches the majority.  The string value ‘ABC123′ is omitted from the import.
030 - Incorrect Column Mapping 1 050 - Missing Data

This is one choice that SQL Server makes that is definitely wrong.  As a data professional, I would much rather have no data and an error than most of the data without getting any warning.  This is the kind of event that will cause much hand-wringing and forehead-scratching as you trace the missing data back from the final destination to the source.

One way to avoid this is to insert some dummy records at the top of the Excel file.  After some experimentation, it seems that Excel requires at least five rows with the correct data type to automatically map the data type.

060 - Dummy Variables Added

070 - Correct Column Mapping

After the import you’ll have to delete the extra rows from the destination table.

In conclusion, you sometimes have to use black magic to get the Import Wizard to correctly set data types on the import of an Excel file.