I am working on a project where I have to import data from thousands of text files. These text files vary in formats: they use different delimiters; they use quoted identifiers never, sometimes, or always; they sometimes have missing end-of-row columns. Importing these took a combination of custom scripts (to clean up and reformat) and XML format files for SQL’s BCP (bulk copy) command… until I found FileHelpers.
FileHelpers by Marcos Meli is a .Net library which provides services to import and export text in delimited and fixed-length file formats. It takes a unique approach to the file import problem:
- You write a class, giving it properties to store the data in the text file.
- You apply attributes to the class and properties, describing the format of the file and applying import rules.
- You import the file using a single command, which creates an array of objects populated with the data from the text file.
With support for custom converters, null types, optional fields, before/after import events, extensive parsing rules, and so much more — not to mention a web site with gobs of examples and documentation — FileHelpers has single-handedly driven me away from the script and batch file approach to a much more powerful .Net application to import my files.
The fact that FileHelpers uses regular .Net classes allows you to use the same class to both import data from a file and persist it to a database. For example, consider the following sample text file:
1,Chris,Smith
2,Bob,Jones
3,"Mary Jane",Brown
Clearly we have three fields in this comma-delimited file, so we write the following class to hold the data. Attributes provide instructions to FileHelpers — specifying that it is a comma-delimited file, and that the quotes in the first and last name fields are optional. (Note that you can use fields or properties; I use fields below for brevity.)
[DelimitedRecord(",")]
public class Person
{
public int PersonID;
[FieldQuoted('"', QuoteMode.OptionalForRead)]
public string FirstName;
[FieldQuoted('"', QuoteMode.OptionalForRead)]
public string LastName;
}
Importing this file now becomes as easy as writing two lines of code.
FileHelperEngine<Person> engine = new FileHelperEngine<Person>();
Person[] records = engine.ReadFile(@"c:\myfile.csv");
You now have an array of Person
objects loaded with data from the text file. Now, let’s say you wanted to persist these objects to a database. You can use the same Person
object to persist to your database with an O/R mapper. If you were using Paul Wilson’s O/R Mapper and the WilsonORWrapper, you would decorate your class as follows:
[DelimitedRecord(","), IgnoreInheritedClass]
public class Person : WilsonORWrapper.Entities.EntityBase<Person>
{
public int PersonID;
[FieldQuoted('"', QuoteMode.OptionalForRead)]
public string FirstName;
[FieldQuoted('"', QuoteMode.OptionalForRead)]
public string LastName;
}
Note I added the base class and the IgnoreInheritedClass
attribute, which tells FileHelpers to ignore any inherited fields and properties (otherwise, the IsReadOnly property from EntityBase
would be included).
With that, I can now persist all the records I read using FileHelpers back to my database using the O/R mapper.
FileHelperEngine<Person> engine = new FileHelperEngine<Person>();
Person[] records = engine.ReadFile(@"c:\myfile.csv");
foreach ( Person p in records )
{
Data<Person>.Track(p);
}
Data<Person>.Save(records);
Incredible! We can even add supporting logic. Let’s say I wanted to only insert new persons into the database. I could do this.
FileHelperEngine<Person> engine = new FileHelperEngine<Person>();
Person[] records = engine.ReadFile(@"c:\myfile.csv");
foreach ( Person p in records )
{
Person pers = Data<Person>.Retrieve("PersonID = " + p.PersonID.ToString());
if (pers == null)
{
Data<Person>.Track(p);
Data<Person>.Insert(p);
}
}
FileHelpers makes this and so much more possible. It’s the backbone of a project I’m writing, and it’s making some heavy data-parsing much easier. To top it all off, FileHelpers is open source — you can download the code from their Subversion repository, and post patches on their support forums. Patches are well-received — I’ve already had two (1, 2) I’ve sent and had applied.
Thanks again to Marcos (and recently-added co-developer, Matt Campbell) for providing this fantastic library. For regular updates, be sure to read the FileHelpers Library Blog.