Automatically generate (partial) XML format files for BCP

I’ve been working with a lot of raw data files lately — data files that come in some delimited format which I must move into a SQL database. Most of the files are CSV files, and some of them have over 100 columns.

Using BCP to import CSV files into SQL is a pleasure. The only annoyance is writing those XML format files. Actually, the annoyance is not writing them, it’s going through the CSV files (which, of course, are undocumented) to determine what delimiters are used in each row.

Here’s a sample of what I mean:

2934,128321,2782,"2007-04-32","Excluded",2321,,22

Fortunately, most CSV files are consistent in their use of quotes, but going through dozens of columns to determine the terminators is a pain. The terminators in the above example aren’t just commas; they could also be quotes. Column three, for example, ends with a comma followed by a quote.

To generate the <record> section of my format files, I wrote the following script, which reads the first line of a text file, finds each comma, determines if it is preceded or followed by a quote, and generates the appropriate XML.

//get filename
var args = WScript.Arguments;
if (args.length == 0)
{
	WScript.Echo('Usage: getdelims.vbs <filename>');
	WScript.Quit();
}

var filename = args(0);

//read file
var fso = new ActiveXObject("Scripting.FileSystemObject");
var file = fso.OpenTextFile(filename,1);
var contents = file.ReadLine();
file.Close();
file = null;
fso = null;

//find commas
var cnt = 0;
for (var i = 1; i < contents.length; i++)
{
	if ( contents.substr(i,1) != ',' ) continue;
	cnt++;
	delim = ",";
	if ( contents.substr(i-1,1) == '"' )
		delim = '&quot;,';
	if ( i+1 < contents.length && contents.substr(i+1,1) == '"' )
		delim += '&quot;';
	WScript.Echo('\t<FIELD ID="' + cnt + '" xsi:type="CharTerm" TERMINATOR="' + delim + '" />');
}

The output can be copy/pasted right into your format file. The example content above would generate the following.

        <field ID="1" xsi:type="CharTerm" TERMINATOR="," />
        <field ID="2" xsi:type="CharTerm" TERMINATOR="," />
        <field ID="3" xsi:type="CharTerm" TERMINATOR=",&quot;" />
        <field ID="4" xsi:type="CharTerm" TERMINATOR="&quot;,&quot;" />
        <field ID="5" xsi:type="CharTerm" TERMINATOR="&quot;," />
        <field ID="6" xsi:type="CharTerm" TERMINATOR="," />
        <field ID="7" xsi:type="CharTerm" TERMINATOR="," />

A few minutes writing a script, and I won’t be looking at CSV files with too many columns ever again (at least, not for the reason of writing XML format files).

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>