C# – Parsing a CSV file

In this article, I’ll show how to parse a CSV file manually and with a parser library (CsvHelper).

Let’s say you have the following CSV file:

Title,Year Inception,2010 Pulp Fiction,1994 Jurassic Park,1993
Code language: plaintext (plaintext)

To manually parse this, split each line with a comma and process the fields in the resulting string[] however you want. If there’s a header row, skip it (headers are necessary for manually editing, and also for third party parsers). The following code shows how to do this:

bool isHeader = true; foreach (var line in File.ReadLines(@"C:\movies.csv")) { if (isHeader) { isHeader = false; continue; } var fields = line.Split(","); //Process the fields Console.WriteLine($"Processed movie {fields[0]} ({fields[1]})"); }
Code language: C# (cs)

Note: You can also use ReadLines().Skip(1) to skip the header row. I do it with a bool flag because it’s more obvious that it’s skipping the header, and it also leaves it open for looking at the header values (inside the if block) if necessary.

This code parses the CSV file and outputs the following:

Processed movie Inception (2010) Processed movie Pulp Fiction (1994) Processed movie Jurassic Park (1993)
Code language: plaintext (plaintext)

Manually parsing CSV is fine in simple scenarios like this. It’s fast and the code is straightforward. But once you start adding complexity (i.e. error detection, lots of columns, quoted values potentially with commas), then I’d suggest using a parser library instead. I’ll show how to use the CsvHelper parser library below.

Note: .NET has a built-in parser called TextFieldParser, but I wouldn’t suggest using it. TextFieldParser abstracts away *some* of the complexity of parsing CSV, but not all of it. Furthermore, it’s relatively slow. If you’re going to use a parser, go all the way and use a good, fast parser like CsvHelper.

Use CsvHelper

I’ll show an example of how to use CsvHelper by using it to parse the following CSV file into Movie objects:

Title,Year,Director,BoxOffice Inception,2010,Christopher Nolan,836.8 Pulp Fiction,1994,Quentin Tarantino,213.9 Jurassic Park,1993,Steven Spielberg,1046 "I, Tonya",2017,Craig Gillespie,53.9
Code language: plaintext (plaintext)

First, install the CsvHelper package (this is using Views > Other Windows > Package Manager Console):

Install-Package CsvHelper
Code language: PowerShell (powershell)

Add a Movie class with properties that match the header column names:

public class Movie { public string Title { get; set; } public int Year { get; set; } public string Director { get; set; } public decimal BoxOffice { get; set; } }
Code language: C# (cs)

Note: Make it easy on yourself by making the property names match the column names exactly (including casing), otherwise you’ll have to do some configuration.

Finally, here’s how to use CsvHelper to parse the CSV file into Movie objects so you can process them one at a time:

using CsvHelper; using System.Globalization; using (var reader = new StreamReader(@"C:\movies.csv")) using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture)) { foreach (var movie in csv.GetRecords<Movie>()) { //Process deserialized records Console.WriteLine($"Movie: {movie.Title} ({movie.Year}, {movie.Director}) made ${movie.BoxOffice} million"); } }
Code language: C# (cs)

This outputs the following:

Movie: Inception (2010, Christopher Nolan) made $836.8 million Movie: Pulp Fiction (1994, Quentin Tarantino) made $213.9 million Movie: Jurassic Park (1993, Steven Spielberg) made $1046 million Movie: I, Tonya (2017, Craig Gillespie) made $53.9 million
Code language: plaintext (plaintext)

Loading the records into a collection

Sometimes you’ll want to process the CSV records one at a time (as shown above). Other times you’ll want to load all of the records into a collection (such as a list) for later use.

The simplest way to get all of the CSV records into a list is to use GetRecords().ToList(). However, I suggest using the following wrapper method to make this flexible for the calling code:

using CsvHelper; using System.Globalization; IEnumerable<T> ParseCsv<T>(string csvFilePath) { using var reader = new StreamReader(csvFilePath); using var csv = new CsvReader(reader, CultureInfo.InvariantCulture); foreach (var record in csv.GetRecords<T>()) yield return record; }
Code language: C# (cs)

Note: You can’t just return csv.GetRecords<T>() from this and try to use it because that results in an ObjectDisposedException. Why? Because the IEnumerable is created in the using block, and returning makes it exit the using block, hence the disposed exception.

Now you can convert it to whatever collection you want in the calling code (nice and flexible!):

using System.Linq; //To a list var moviesList = ParseCsv<Movie>(@"C:\temp\movies.csv").ToList(); //Or to a dictionary var moviesByTitleMap = ParseCsv<Movie>(@"C:\temp\movies.csv").ToDictionary(m => m.Title);
Code language: C# (cs)

Just return GetRecords().ToList() instead if you don’t want this flexibility.

Errors

CsvHelper has incredibly detailed error messages. For example, I put “a” for a decimal field in the data and got the following exception:

Unhandled exception. CsvHelper.TypeConversion.TypeConverterException: The conversion cannot be performed. Text: 'a' MemberType: System.Decimal TypeConverter: 'CsvHelper.TypeConversion.DecimalConverter' IReader state: ColumnCount: 0 CurrentIndex: 3 HeaderRecord: ["Title","Year","Director","BoxOffice"] IParser state: ByteCount: 0 CharCount: 189 Row: 5 RawRow: 5 Count: 4 RawRecord: "I, Tonya",2017,Craig Gillespie,a
Code language: plaintext (plaintext)

This basically tells you everything you need to know to troubleshoot the problem in the data.

Performance comparison: CsvHelper vs TextFileParser vs manual parsing

One more thing: according to this article about CSV parsing performance, CsvHelper is one of the top performing parsers. Here’s how long it took to parse a CSV file with 1 million records:

  • CsvHelper: 2.4 seconds
  • Manual parsing (string.Split(): 2.9 seconds
  • TextFileParser: 18.3 seconds

So CsvHelper and manual parsing are about the same, and both of them are about 7x faster than using TextFileParser.

Note: There are a few faster parsers, but I haven’t used them, so I can’t really comment on them.

Leave a Comment