When you want to remove a list of characters from a string, loop through the list and use string.Replace():
var removalList = new List<char> { 'e', 'o' };
var input = "Hello World";
var cleanedInput = input;
foreach (char c in removalList)
{
cleanedInput = cleanedInput.Replace(c.ToString(), string.Empty);
}
Console.WriteLine($"Before: {input}");
Console.WriteLine($"After: {cleanedInput}");
Code language: C# (cs)
Note that string.Replace() returns a new string (because strings are immutable).
Running this outputs the following:
Before: Hello World
After: Hll Wrld
Code language: plaintext (plaintext)
This is the fastest approach (in .NET 6+).
Linq approach: Where() + ToArray() + new string()
Another option for removing a list of characters is to use a Linq one-liner:
- Use Where() on the string to remove the characters you don’t want. This gives you an IEnumerable<char>.
- Use ToArray() to convert this to a char array
- Use new string() to convert this to a string.
Here’s the code:
using System.Linq;
var removalList = new List<char> { 'e', 'o' };
var input = "Hello World";
var cleanedInput = new string(input.Where(c => !removalList.Contains(c)).ToArray());
Console.WriteLine($"Before: {input}");
Console.WriteLine($"After: {cleanedInput}");
Code language: C# (cs)
This outputs the following:
Before: Hello World
After: Hll Wrld
Code language: plaintext (plaintext)
This is 2x slower than the fastest approach, but it’s a one-liner, which is appealing in some cases.
StringBuilder + loop (fastest before .NET 6)
Before .NET 6, the fastest option was to loop through the string and add characters to keep (i.e. not in the removal list) to a StringBuilder). So if you’re in a version before .NET 6, do this approach.
Here’s an example of how to do that:
using System.Text;
var removalList = new List<char> { 'e', 'o' };
string input = "Hello World";
var sb = new StringBuilder();
foreach (var c in input)
{
if (!removalList.Contains(c))
sb.Append(c);
}
string cleanedInput = sb.ToString();
Console.WriteLine($"Before: {input}");
Console.WriteLine($"After: {cleanedInput}");
Code language: C# (cs)
This outputs the following:
Before: Hello World
After: Hll Wrld
Code language: plaintext (plaintext)
Performance comparison results
I showed three options for removing a list of characters from a string. I didn’t show how to do this with regex because it’s by far the slowest approach. To compare the performance, I ran 100k iterations removing a list of 15 characters from a string containing 2.5k characters. The following table summarizes the performance comparison:
Approach | Avg (ms) | Min (ms) | Max (ms) |
string.Replace() in a loop | 0.03 | 0.02 | 1.32 |
StringBuilder in a loop | 0.04 | 0.03 | 4.35 |
Linq Where() + new string() + ToArray() | 0.06 | 0.04 | 4.19 |
Regex | 0.09 | 0.06 | 16.58 |
List<char> is faster than using HashSet<char>
One surprising result is that List<char> is faster than HashSet<char> in every approach I compared. However, in every case, I used a list of only 15 characters. With so few characters, the overhead costs of the HashSet<char> don’t outweigh its benefits. As the number of characters increases, I would expect HashSet<char> to eventually outperform List<char>.
Comments are closed.