C# - Parsing a CSV file

In this article, I’ll show how to parse a CSV file manually and with a parser library (CsvHelper).

Let’s say you have the following CSV file:

Title,Year
Inception,2010
Pulp Fiction,1994
Jurassic Park,1993
Code language: plaintext (plaintext)

To manually parse this, read the file line by line and split each line with a comma. This gives you a string array containing the fields that you can process as needed. If there’s a header row, skip it (headers are necessary for manually editing, and also for third party parsers). The following code shows how to do this:

bool isHeader = true;
foreach (var line in File.ReadLines(@"C:\movies.csv"))
{
	if (isHeader)
	{
		isHeader = false;
		continue;
	}

	var fields = line.Split(",");

	//Process the fields
	Console.WriteLine($"Processed movie {fields[0]} ({fields[1]})");
}
Code language: C# (cs)

Note: You can also use ReadLines().Skip(1) to skip the header row. I do it with a bool flag because it’s more obvious that it’s skipping the header, and it also leaves it open for looking at the header values (inside the if block) if necessary.

This code parses the CSV file and outputs the following:

Processed movie Inception (2010)
Processed movie Pulp Fiction (1994)
Processed movie Jurassic Park (1993)Code language: plaintext (plaintext)

Manually parsing CSV is fine in simple scenarios like this. It’s fast and the code is straightforward. But once you start adding complexity (i.e. error detection, lots of columns, quoted values potentially with commas), then I’d suggest using a parser library instead. I’ll show how to use the CsvHelper parser library below.

Note: .NET has a built-in parser called TextFieldParser, but I wouldn’t suggest using it. TextFieldParser abstracts away *some* of the complexity of parsing CSV, but not all of it. Furthermore, it’s relatively slow. If you’re going to use a parser, go all the way and use a good, fast parser like CsvHelper.

Use CsvHelper

I’ll show an example of how to use CsvHelper by using it to parse the following CSV file into Movie objects:

Title,Year,Director,BoxOffice
Inception,2010,Christopher Nolan,836.8
Pulp Fiction,1994,Quentin Tarantino,213.9
Jurassic Park,1993,Steven Spielberg,1046
"I, Tonya",2017,Craig Gillespie,53.9
Code language: plaintext (plaintext)

First, install the CsvHelper package (this is using Views > Other Windows > Package Manager Console):

Install-Package CsvHelper
Code language: PowerShell (powershell)

CsvHelper maps fields to properties by using the column names in the header row. So add a class with properties matching the fields in the CSV file. Make it easy on yourself by making the property names match the column names exactly (including casing), otherwise you’ll need to do some mapping configuration. The following Movie class matches the CSV fields shown up above:

public class Movie
{
    public string Title { get; set; }
    public int Year { get; set; }
    public string Director { get; set; }
    public decimal BoxOffice { get; set; }
}
Code language: C# (cs)

Note: You can use CsvHelper to parse a CSV file even if it doesn’t have a header row.

Finally, use CsvHelper to parse the CSV file into Movie objects so you can process them one at a time:

using CsvHelper;
using System.Globalization;

using (var reader = new StreamReader(@"C:\movies.csv"))
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
    foreach (var movie in csv.GetRecords<Movie>())
    {
        //Process deserialized records

        Console.WriteLine($"Movie: {movie.Title} ({movie.Year}, {movie.Director}) made ${movie.BoxOffice} million");
    }
}
Code language: C# (cs)

This outputs the following:

Movie: Inception (2010, Christopher Nolan) made $836.8 million
Movie: Pulp Fiction (1994, Quentin Tarantino) made $213.9 million
Movie: Jurassic Park (1993, Steven Spielberg) made $1046 million
Movie: I, Tonya (2017, Craig Gillespie) made $53.9 millionCode language: plaintext (plaintext)

Loading the records into a collection

Sometimes you’ll want to process the CSV records one at a time (as shown above). Other times you’ll want to load all of the records into a collection (such as a list) for later use.

The simplest way to get all of the CSV records into a list is to use GetRecords().ToList(). However, I suggest using the following wrapper method using yield return to make this flexible for the calling code:

using CsvHelper;
using System.Globalization;

IEnumerable<T> ParseCsv<T>(string csvFilePath)
{
	using var reader = new StreamReader(csvFilePath);
	using var csv = new CsvReader(reader, CultureInfo.InvariantCulture);

	foreach (var record in csv.GetRecords<T>())
		yield return record;
}
Code language: C# (cs)

Note: You can’t just return csv.GetRecords<T>() from this and try to use it because that results in an ObjectDisposedException. Why? Because the IEnumerable is created in the using block, and returning makes it exit the using block, hence the disposed exception.

Now you can convert it to whatever collection you want in the calling code (nice and flexible!). Here’s an example of converting to a list with ToList() and converting to a dictionary with ToDictionary():

using System.Linq;

//To a list
var moviesList = ParseCsv<Movie>(@"C:\temp\movies.csv").ToList();

//Or to a dictionary
var moviesByTitleMap = ParseCsv<Movie>(@"C:\temp\movies.csv").ToDictionary(m => m.Title);
Code language: C# (cs)

Just return GetRecords().ToList() instead if you don’t want this flexibility.

Errors

CsvHelper has incredibly detailed error messages. For example, I put “a” for a decimal field in the data and got the following exception:

Unhandled exception. CsvHelper.TypeConversion.TypeConverterException: The conversion cannot be performed.
    Text: 'a'
    MemberType: System.Decimal
    TypeConverter: 'CsvHelper.TypeConversion.DecimalConverter'
IReader state:
   ColumnCount: 0
   CurrentIndex: 3
   HeaderRecord:
["Title","Year","Director","BoxOffice"]
IParser state:
   ByteCount: 0
   CharCount: 189
   Row: 5
   RawRow: 5
   Count: 4
   RawRecord:
"I, Tonya",2017,Craig Gillespie,aCode language: plaintext (plaintext)

This basically tells you everything you need to know to troubleshoot the problem in the data.

Performance comparison: CsvHelper vs TextFileParser vs manual parsing

One more thing: according to this article about CSV parsing performance, CsvHelper is one of the top performing parsers. Here’s how long it took to parse a CSV file with 1 million records:

CsvHelper: 2.4 seconds
Manual parsing (string.Split(): 2.9 seconds
TextFileParser: 18.3 seconds

So CsvHelper and manual parsing are about the same, and both of them are about 7x faster than using TextFileParser.

Note: There are a few faster parsers, but I haven’t used them, so I can’t really comment on them.

C# – Parsing a CSV file

Use CsvHelper

Loading the records into a collection

Errors

Performance comparison: CsvHelper vs TextFileParser vs manual parsing

Related Articles

Leave a Comment Cancel reply