Enhancing Transaction Performance by Adding Another Log File
I recently read a piece of writing about a database that has multiple transaction log files. The full response to the query: "When does it make sense to add another log file to improve transaction log performance?".
Let me start by describing the scenarios I'm talking about. To “add a log file” to a database, there are two different methods:
- Include an additional log file in the same database.
Both methods can be used to boost performance, but the first one is much more frequently used. I'll start with the first one and get it out of the way:
Adding a Second Database
Your I/O subsystem's latency sets a speed limit on the log file. Even if you only use DRAM-backed caches, latency increases when the I/O is made more complex (for instance, by using EMC SRDF for geo-replicated clusters or VM-ware and Hyper-V replicated disks).
Because SQL Server flushes the log at a max size of 60KB and because the flush is with 1 outstanding I/O – your total transaction log throughput is 1 / [latency in s] * 60KB. The maximum write throughput of the database is 60MB/sec for a reasonably configured system with a log latency of 1ms.
In order to scale transaction log throughput beyond the constraints of your current latency, you must divide the data into multiple databases and, consequently, multiple log files. Sharding is one name for this technique. It should be noted that this is not just a trick to scale across multiple machines, but also makes sense even on the same machine. This technique works especially well in settings where high log latency is anticipated. (ex: Azure).
Adding yet another transaction log file to the same database
In the article I read, I brought up a situation where adding a second transaction log to the same database would also make sense in terms of performance.
Just to be clear, one database in SQL Server can only write to one log file at a time. (this is not generally true of databases). There is no concurrency to be had by adding multiple log files to the same database, unlike with data files; instead, you must use the sharding trick mentioned above. This does not imply, however, that there isn't a performance advantage in some specialized situations. These are the cases that I say are rare, but they do exist.
Think about cache and disk topologies. When you give a cache enough time to flush in a SAN or on a disk shelf supported by spindles, it will offer consistent throughput when you do write to it. The same is valid for SSD/flash: if you give the drive some time to recover from the "write cliff" (which, on a modern drive, isn't really a cliff at all), it will offer a more consistent performance the next time you write to it. In general, bursty writes are easier on disks of all types than consistently high writes.
Let's now assume that you have multiple LUNs, each of which is backed by a cache or available write cycles (in the case of a disk). (in the case of SSD). As long as the log files you add are small enough, you can now cycle between those LUN using the mechanics of the log.
The performance metrics on the log can be extremely consistent with this method. However, configuring and troubleshooting it is very difficult.
I've covered the only circumstances I'm aware of in this blog where adding more than one log file makes sense in terms of performance.
The second scenario, adding additional log files to a single database to improve performance, is extremely uncommon. But I am aware of at least two systems that are or have been in production with this. The DBA's expert tuning was the only thing that allowed for this. I have also employed this tactic to obtain reliable benchmark results.
A single log file for performance is sufficient in the vast majority of circumstances. But now that you've learned a new trick, I hope I've also piqued your interest.
Explore the benefits and drawbacks of the rolling upgrade, blue-green deployment, canary release, and immutable infrastructure strategies...
Explore the challenges of database schema migration and various tools like Liquibase, Flyway, Schema Guard and more products for effective schema versioning...
PostgreSQL is the most advanced and flexible open source SQL database today. With this power and flexibility comes a problem...
Read my write out about the Dynamic Management Views in SQL Server...