C# – Parsing CSV data when a field has commas

When you have commas in your CSV fields, it creates a conflict with the field delimiting commas. In other words, you can’t tell which data belongs to which field. How you deal with this will depend on one question: is the field with the comma enclosed in quotes?

Comma is enclosed in quotes

Spreadsheet programs and good CSV generators will automatically detect commas in fields and enclose the field with quotes. CSV with quote-enclosed commas look like this:

Title,Year "I, Tonya",2017 Inception,2010
Code language: plaintext (plaintext)

If you’re running into this situation, don’t try to parse the CSV manually. Use a CSV parser library instead, such as CsvHelper. This automatically handles quote-enclosed commas.

To use CsvHelper, first install the package:

Install-Package CsvHelper
Code language: PowerShell (powershell)

Note: This is using Package Manager Console (View > Other Windows).

And here’s an example of using it to parse the CSV data shown above (contained in a file):

using CsvHelper; using System.Globalization; using (var reader = new StreamReader(@"C:\movies.csv")) using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture)) { foreach (var movie in csv.GetRecords<Movie>()) { Console.WriteLine($"{movie.Title} was released in {movie.Year}"); } }
Code language: C# (cs)

This outputs the following, showing that it correctly handled the field with the comma in it:

I, Tonya was released in 2017 Inception was released in 2010
Code language: plaintext (plaintext)

Comma is not enclosed in quotes

Here’s what it looks like when you have commas in your fields and they aren’t enclosed in quotes:

Title,Year I, Tonya,2017 Inception,2010
Code language: plaintext (plaintext)

This is a harder problem to deal with, but it’s not impossible.

This is malformed CSV data, because it doesn’t follow the standards of quote-enclosing commas. If possible, fix the data. If you are in control of generating the CSV data in the first place, then fix it by quote-enclosing the commas (or using a different delimiter, such as a tab). Then you can use a parser to deal with it (as shown in the previous section).

If you can’t fix the data, and really must deal with it as is, then one solution is to merge the slots that belong to the string field. For example:

  • Split “I, Tonya,2017” with a comma. This gives you [I][Tonya][2017].
  • Check if there are more fields than expected. We expect 2 columns (Title,Year), and there are 3, so we know the string field had a comma in it.
  • Merge the string field’s parts back together – [I] and [Tonya] – resulting in [I, Tonya][2017].

Here’s code that does this:

using System.IO; using System.Linq; var csvFilePath = @"C:\movies.csv"; var expectedFieldCount = 2; foreach (var line in File.ReadLines(csvFilePath).Skip(1)) //skip the header row { var fields = line.Split(',').ToList(); //Check if the string field had commas and then merge its parts together if (fields.Count > expectedFieldCount) { int mergeCount = 1 + (fields.Count - expectedFieldCount); var mergedString = string.Join(",", fields.Take(mergeCount)); fields.RemoveRange(0, mergeCount); fields.Insert(0, mergedString); } Console.WriteLine($"{fields[0]} was released in {fields[1]}"); }
Code language: C# (cs)

This outputs the following, showing that it correctly handled the field with the comma in it:

I, Tonya was released in 2017 Inception was released in 2010
Code language: plaintext (plaintext)

To repeat: this approach requires you to know that a string field *might* have commas in it. If you are dealing with problematic CSV data like this, this is just a hoop you have to jump through to parse it.

Leave a Comment