C# – Remove a set of characters from a string

The fastest and simplest way to remove a set of characters from a string is to use StringBuilder + List<char>, like this:

public static string RemoveChars(string input, List<char> charsToRemove) { if (string.IsNullOrEmpty(input)) return input; var sb = new StringBuilder(); foreach (var c in input) { if (!charsToRemove.Contains(c)) sb.Append(c); } return sb.ToString(); }
Code language: C# (cs)

I compared this with three other approaches. I did 100,000 iterations with a string with 2500 characters and a list of 15 characters to remove. This StringBuilder approach is almost 2x faster than the second fastest approach.

Here is the summary of performance stats for all of the approaches:

ApproachTotal (ms)Average (ms)Min (ms)Max (ms)
StringBuilder4251.910.0420.0360.42
LINQ + new string() + ToArray()7176.470.0710.0470.74
LINQ + string.Concat()8485.750.0850.0591.64
Regex31368.220.310.252.45

One surprising result is that List<char> is faster than HashSet<char> in every approach I compared. However, in every case, I used a list of only 15 characters. With so few characters, the overhead costs of the HashSet<char> don’t outweigh its benefits. As the number of characters increases, I would expect HashSet<char> to eventually outperform List<char>.

In the rest of this article, I’ll show the code for the other approaches I compared and show how I measured and compared performance.

Other approaches

The following approaches are slower than the StringBuilder approach. The LINQ approaches may be considered subjectively simpler than the StringBuilder approach (if you prefer LINQ over foreach loops).

LINQ + new string() + ToArray()

This uses LINQ to filter out characters, then uses new string() + ToArray() to convert the result to a string:

public static string RemoveChars(string input, List<char> charsToRemove) { if (string.IsNullOrEmpty(input)) return input; return new string(input.Where(c => !charsToRemove.Contains(c)).ToArray()); }
Code language: C# (cs)

The performance stats:

Total Time: 7176.47ms Avg=0.071ms Min=0.047ms Max=0.74ms
Code language: plaintext (plaintext)

LINQ + string.Concat()

This uses LINQ to filter the characters and then uses Concat() to convert the result to a string:

public static string RemoveChars(string input, List<char> charsToRemove) { if (string.IsNullOrEmpty(input)) return input; return string.Concat(input.Where(c => !charsToRemove.Contains(c))); }
Code language: C# (cs)

The performance stats:

Total Time: 8485.75ms Avg=0.085ms Min=0.059ms Max=1.64ms
Code language: plaintext (plaintext)

Regex

Using regex for this problem is not a good idea. It’s the slowest and least simple approach:

static Regex charsToRemoveRegex = new Regex("[<>?;&*=~^+|:,/m]", RegexOptions.Compiled); public static string RemoveChars(string input) { if (string.IsNullOrEmpty(input)) return input; return charsToRemoveRegex.Replace(input, ""); }
Code language: C# (cs)

The performance stats:

Total Time: 31368.22ms Avg=0.31ms Min=0.25ms Max=2.45ms
Code language: plaintext (plaintext)

Ouch, that’s slow.

Performance comparison approach

For each approach, I did 100,000 iterations and used a string of length 2500 with a list of 15 characters to remove.

Whenever comparing performance, it’s a good idea to check the total, average, min, and max times. Don’t only rely on the total and average. The min and max tell you the width of the distribution of execution times. The tighter the distribution, the better. If you look at the performance summary table, notice that the StringBuilder approach has the best average time and also the tightest distribution of execution times.

The first execution of any code will always be slower than subsequent executions. So when comparing performance, it’s always a good idea to “warm-up” the code, or discard the first execution result so it doesn’t majorly skew the results. I’m logging the first execution (and showing that it is always the max), and then discarding it.

Here is the code I used to test the performance of each approach:

static void Main(string[] args) { List<char> charsToRemove = new List<char> { '<','>','?',';','&','*', '=','~','^', '+','|',':',',' ,'/','m' }; var testSb = new StringBuilder(); for(int i = 0; i < 100; i++) { testSb.Append("<>?hello;&*=~world^+|:,/m"); } var testString = testSb.ToString(); Console.WriteLine(testString.Length); List<double> elapsedMS = new List<double>(); Stopwatch sw = Stopwatch.StartNew(); for (int i = 0; i < 100_000; i++) { var cleanedString = RemoveChars(testString.ToString(), charsToRemove); elapsedMS.Add(sw.Elapsed.TotalMilliseconds); sw.Restart(); } sw.Stop(); //First() is always much larger and skews the Sum() and Average(). Print it here, but then remove it for the other aggregates Console.WriteLine($"First={elapsedMS.First()}ms Max={elapsedMS.First()}ms"); elapsedMS.RemoveAt(0); Console.WriteLine($"Total Time: {elapsedMS.Sum()}ms Avg={elapsedMS.Average()}ms Min={elapsedMS.Min()}ms Max={elapsedMS.Max()}ms"); }
Code language: C# (cs)

Leave a Comment