A project I’m working on requires importing data from hundreds of megabytes of delimited text files into a staging database, then converting and migrating the data into an operations database. (Forgive the naming conventions. The “staging” database is used as an intermediary between the text files and the final database. The “operations” database is the one the application uses.)
What’s the best way to do automate this? Ideally, I need to be able to re-run the import at any time, preferably with a single command. The process should purge the staging database, rebuild tables (since schemas may change), reload all data, and run the necessary migration scripts. Since the final product is merely a web interface to the imported data, there is no harm in wiping and recreating everything from scratch, as there is no “user data” of note. (Actually there is, but it is isolated from this process.)
The solution I came up with used a series of SQL scripts executed using sqlcmd (part of MS SQL 2005; use osql for older versions) and data import tasks using bcp (part of MS SQL), all tied together using batch files.
The script to build the staging table (which required use of sqlcmd and bcp) was broken down into four steps:
- Copy files from a source directory to a temporary working directory.
- Clean up file formats.
- Build tables from SQL scripts.
- Import data using bcp.
Step 1: The first step set working variables, ensured the working directory existed, and copied files.
setlocal set xsrc=C:devprojectdata set dest=C:devprojecttrunkData_import set scripts=C:devprojecttrunkData
@echo Preparing import directory... if not exist %dest% md %dest% del /q %dest%*.dat @echo Copying files for import... copy %src%players*.dat %dest% /y copy %src%games*.dat %dest% /y ...
Step 2: The files received need some tweaking to make importing easier, so I use a gsar utility for search-and-replace. Note the use of the for command, which allows you to repeat a command line based on all matching files in a directory (in this case, all *.dest files in the %dest% directory).
@echo Cleaning up files for import... for %%f in (%dest%*.dat) do gsar -o %%f -sNULL -r""""""
Steps 3 and 4: Again, I use for loops to recursively create tables and import data.
@echo Creating tables... for %%f in (%root%tables*.sql) do sqlcmd -U user -P pass -S (local) -d database -i %%f ... @echo Importing data... for %%f in (%dest%games*.dat) do bcp db1.dbo.games in %%f -f games.xml -T -F 1 -m 1 -h "TABLOCK" ...
That’s pretty much it. It’s nice to be able to click one batch file and automatically build and load data into databases.