Thursday, July 07, 2005

Bulk loading data in SQL Server

This example combines dynamic SQL, BULK INSERT and the proper handling of double-quotes to solve a client's problem with loading various text file formats into a database.
One of my clients contacted me recently and said they needed some help creating a stored procedure that imported data from a text file. They wanted the procedure to accept three parameters: PathFileName, OrderID, and FileType. The PathFileName is simply the name and physical location of the source file on the hard drive, the OrderID is generated in the program that calls the procedure and the FileType indicates the format of the data in the source file. The two possible formats for the source data are shown here:

FileType=1 (TxtFile1.txt)

"Kelly","Reynold","kelly@reynold.com"
"John","Smith","bill@smith.com"
"Sara","Parker","sara@parker.com"

FileType=2 (TxtFile2.txt)

Kelly,Reynold,kelly@reynold.com
John,Smith,bill@smith.com
Sara,Parker,sara@parker.com

BULK INSERT

I decided to use BULK INSERT to implement the solution. The BULK INSERT statement was introduced in SQL Server 7 and allows you to interact with bcp (bulk copy program) via a script. In pre-7 versions the only way you could access bcp functionality was from a command prompt. I am not going to list the full syntax of BULK INSERT here (but you can find it here), because it is a little long and most of it does not apply to the problem I am solving. Instead, I will show the valid BULK INSERT statements used to load the data shown above.

BULK INSERT TmpStList FROM 'c:\TxtFile1.txt' WITH (FIELDTERMINATOR = '","')

TmpStList is the target table and TxtFile1.txt is the source data file. The source file is located in the root of the C drive. The FIELDTERMINATOR argument allows you to specify the delimeter used to discern column values.

The valid statement for FileType=2 is shown here:

BULK INSERT tmpStList FROM 'c:\TxtFile2.txt' WITH (FIELDTERMINATOR = ',')

The only difference is the value of the FIELDTERMINATOR argument.

The Solution

The stored procedure used to implement the solution is fairly straigtforward once you master the BULK INSERT statement. The only real trick is loading the data that comes to you in FileType=1 format. Because a double-quote starts and ends a data row, it too is loaded in the table. The FIELDTERMINATOR works between columns, not at the beginning or end of a row. To workaround this I simply load the data into a temporary table and then use a CASE statement and the SUBSTRING and DATALENGTH functions to load the correct data in the final table. The FileType=2 data will load as-is, but I still put in the temporary table for consistency (easier programming).

The SQL statements that create the temporary and final table are shown here.

CREATE TABLE StudentList
(
StID int IDENTITY NOT NULL,
StFName varchar(50) NOT NULL,
StLName varchar(50) NOT NULL,
StEmail varchar(100) NOT NULL,
OrderID int NOT NULL
)
go
CREATE TABLE TmpStList
(
stFName varchar (50) NOT NULL,
stLName varchar (50) NOT NULL,
stEmail varchar (100) NOT NULL
)
go

No comments: