When you need to insert multiple rows into the database, consider doing a Bulk Insert instead of inserting one row at a time. Bulk Insertions are up to 20x faster than executing SQL Insert repeatedly.
The simplest way to do a SQL Bulk Insert is by using the built-in SqlBulkCopy (from System.Data.SqlClient) with a DataTable. To use this:
- Create a DataTable and add columns/types that match the database table.
- Add rows to the DataTable.
- Initialize SqlBulkCopy with the connection string and specify the table name.
- Pass the DataTable to SqlBulkCopy.WriteToServer().
Here’s an example:
using System.Data;
using System.Data.SqlClient;
void BulkInsert(List<Person> people)
{
DataTable table = new DataTable();
table.TableName = "People";
table.Columns.Add(nameof(Person.Id), typeof(string));
table.Columns.Add(nameof(Person.Name), typeof(string));
table.Columns.Add(nameof(Person.BirthDate), typeof(DateTime));
table.Columns.Add(nameof(Person.FavoriteMovie), typeof(string));
table.Columns.Add(nameof(Person.FavoriteNumber), typeof(int));
foreach (var person in people)
{
var row = table.NewRow();
row[nameof(Person.Id)] = person.Id.ToString();
row[nameof(Person.Name)] = person.Name;
row[nameof(Person.BirthDate)] = GetDBValue(person.BirthDate);
row[nameof(Person.FavoriteMovie)] = person.FavoriteMovie;
row[nameof(Person.FavoriteNumber)] = GetDBValue(person.FavoriteNumber);
table.Rows.Add(row);
}
using (var bulkInsert = new SqlBulkCopy(ConnectionString))
{
bulkInsert.DestinationTableName = table.TableName;
bulkInsert.WriteToServer(table);
}
}
object GetDBValue(object o)
{
return o ?? (object)DBNull.Value;
}
Code language: C# (cs)
Yeah, needing to map the DataTable column names/types is a bit tedious, but the performance gains of bulk inserting are well worth the effort. If you want, you can generalize this mapping code by getting properties with reflection. However, in practice, I like to keep the bulk insertion logic as simple and direct as possible.
Two things to watch out for
- Nulls: If your object is null, you need to pass in DBNull.Value instead of a regular null.
That’s why the code in the example above is using this method:
object GetDBValue(object o)
{
return o ?? (object)DBNull.Value;
}
Code language: C# (cs)
- Min DateTime: You can’t pass in DateTime.MinValue. Otherwise you’ll get SqlTypeException: SqlDateTime overflow.
Instead of DateTime.MinValue, pass in System.Data.SqlTypes.SqlDateTime.MinValue.Value.
Performance comparison: Multiple Inserts vs one Bulk Insert
To compare the performance, I’m going to insert 100,000 records into the People table. This table is defined in the section below. It has a primary key and two indexes.
I’ll use a Stopwatch in the code and use SQL Profiler to be able to compare the CPU time and number of IO reads/writes.
In between each test run, I’ll clear out the table with DELETE FROM.
People table definition
Here is the People table definition:
CREATE TABLE [dbo].[People](
[Id] [nvarchar](50) NOT NULL,
[Name] [nvarchar](50) NOT NULL,
[BirthDate] [datetime] NULL,
[FavoriteMovie] [nvarchar](50) NULL,
[FavoriteNumber] [int] NULL,
CONSTRAINT [PK_People] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
Code language: SQL (Structured Query Language) (sql)
It has the following two indexes:
CREATE NONCLUSTERED INDEX [IX_BirthDate] ON [dbo].[People]
(
[BirthDate] ASC
)
INCLUDE([Name],[Id]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IX_FavMovie] ON [dbo].[People]
(
[FavoriteMovie] ASC
)
INCLUDE([Id],[Name]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
Code language: SQL (Structured Query Language) (sql)
Generating the test data
I’m using the following method to generate the list of 100,000 people:
List<Person> GenerateRecords()
{
var people = new List<Person>();
for (int i = 0; i < 100_000; i++)
{
DateTime? birthDate = DateTime.Now;
string favoriteMovie = $"Movie{i}";
int? favoriteNumber = i;
if (i % 10 == 0)
{
birthDate = null;
favoriteMovie = null;
favoriteNumber = null;
}
people.Add(new Person()
{
BirthDate = birthDate,
FavoriteMovie = favoriteMovie,
FavoriteNumber = favoriteNumber,
Id = Guid.NewGuid(),
Name = $"Person{i}"
});
}
return people;
}
Code language: C# (cs)
Performance Test – Multiple Inserts
This is inserting 100,000 records into the People table. It’s using Dapper to insert the list of people.
using System.Data;
using System.Data.SqlClient;
using Dapper;
List<Person> people = GenerateRecords();
Stopwatch sw = new Stopwatch();
sw.Start();
MultipleInserts(people);
sw.Stop();
Console.WriteLine($"Elapsed={sw.Elapsed}. TotalMS={sw.Elapsed.TotalMilliseconds}");
void MultipleInserts(List<Person> people)
{
using (var con = new SqlConnection(ConnectionString))
{
con.Execute(@"INSERT INTO [dbo].[People]
([Id]
,[Name]
,[BirthDate]
,[FavoriteMovie]
,[FavoriteNumber])
VALUES
(@Id
,@Name
,@BirthDate
,@FavoriteMovie
,@FavoriteNumber)",
people);
}
}
Code language: C# (cs)
This is executing 100,000 individual INSERT statements. Passing the people list to Dapper is the same as looping over the people list yourself and calling Execute() for each person object. You can confirm this by running SQL Profiler and observing that it’s doing 100,000 individual inserts.
Results
I measured the performance with a Stopwatch and with SQL Profiler.
Stopwatch results:
Elapsed=00:00:33.6690405. TotalMS=33669.0405
Code language: plaintext (plaintext)
SQL Profiler results:
CPU time (MS) | Page Reads (IO) | Page Writes (IO) | Duration (MS) |
11,600 | 1,362,901 | 10,371 | 33,463 |
Performance Test – Bulk Insert
This is using SqlBulkCopy to execute a single Bulk Insert. It’s inserting all 100,000 records at once.
using System.Data;
using System.Data.SqlClient;
List<Person> people = GenerateRecords();
Stopwatch sw = new Stopwatch();
sw.Start();
BulkInsert(people);
sw.Stop();
Console.WriteLine($"Elapsed={sw.Elapsed}. TotalMS={sw.Elapsed.TotalMilliseconds}");
void BulkInsert(List<Person> people)
{
DataTable table = new DataTable();
table.TableName = "People";
table.Columns.Add(nameof(Person.Id), typeof(string));
table.Columns.Add(nameof(Person.Name), typeof(string));
table.Columns.Add(nameof(Person.BirthDate), typeof(DateTime));
table.Columns.Add(nameof(Person.FavoriteMovie), typeof(string));
table.Columns.Add(nameof(Person.FavoriteNumber), typeof(int));
foreach (var person in people)
{
var row = table.NewRow();
row[nameof(Person.Id)] = person.Id.ToString();
row[nameof(Person.Name)] = person.Name;
row[nameof(Person.BirthDate)] = GetDBValue(person.BirthDate);
row[nameof(Person.FavoriteMovie)] = GetDBValue(person.FavoriteMovie);
row[nameof(Person.FavoriteNumber)] = GetDBValue(person.FavoriteNumber);
table.Rows.Add(row);
}
using (var bulkInsert = new SqlBulkCopy(ConnectionString))
{
bulkInsert.DestinationTableName = table.TableName;
bulkInsert.WriteToServer(table);
}
}
public static object GetDBValue(object o)
{
return o ?? (object)DBNull.Value;
}
Code language: C# (cs)
Results
I measured the performance with a Stopwatch and with SQL Profiler.
Stopwatch results:
Elapsed=00:00:02.0443607. TotalMS=2044.3607
Code language: plaintext (plaintext)
SQL Profiler results:
CPU time (MS) | Page Reads (IO) | Page Writes (IO) | Duration (MS) |
1,156 | 669,996 | 6,805 | 1,603 |
Conclusion
Bulk Insert performed much, much better than multiple inserts, as the following summary table shows:
Stopwatch (MS) | CPU time (MS) | Page Reads (IO) | Page Writes (IO) | Duration (MS) | |
Multiple Inserts | 33,669 | 11,600 | 1,362,901 | 10,371 | 33,463 |
Bulk Insert | 2,044 | 1,156 | 669,996 | 6,805 | 1,603 |
Bulk Insert was 15-20 times faster, used 10x less CPU time, and did ~50% fewer IO operations.
The more records you’re trying to insert, the better Bulk Insert performs compared to multiple inserts. I ran another performance test using just 1,000 records and Bulk Insert 2x faster.
And this is only comparing the performance of the insert operation. When you consider the system as a whole, inefficient queries can really degrade performance. This is where Bulk Insert really shines.
Comments are closed.