Sometimes the most maddening bugs aren’t syntax errors or missing indexes — they’re the ones that look completely fine until you stare at a clock at 11:45 PM wondering why your job is still running.
The Problem
I have recently been working on tuning a job that purges data out of various tables that are older than 13 months using GETDATE(). Pretty straightforward, right? Well, originally the job consisted of 6 different statements like so:
DELETE FROM tablename WHERE column_with_date <= DATEADD(MONTH, -13, GETDATE());Except the tables had become so large the generic deletes were filling up tempdb (SQL Server’s shared workspace for temporary operations like sorting, spooling, and row versioning). No Bueno. So I decided to rewrite the job using six separate stored procedures instead. While redundant, we could modify the behavior for each delete independently should we ever need to. More importantly, we could batch out the deletes to prevent tempdb from filling up.
So I went about creating a generic template to run a delete in batches. I think I based the code on a script on DBA Stack Exchange.
SET NOCOUNT ON;
DECLARE @rowcount INT;SET @rowcount = 1;
WHILE @rowcount > 0BEGIN BEGIN TRANSACTION;
DELETE TOP (500000) FROM database.dbo.tablename WHERE column <= DATEADD(MONTH, -13, GETDATE()) OPTION (MAXDOP 1);
SET @rowcount = @@ROWCOUNT;
2 collapsed lines
COMMIT TRANSACTION;ENDOne of the stored procedures began using parallelism and causing high CPU usage, so I added the MAXDOP 1 query hint. MAXDOP 1 forces the query to use a single CPU thread, preventing it from spawning parallel workers and hammering the server. All of the tables have a timestamp column, so GETDATE() works great as a filter. Except not all of the timestamp columns were formatted the same way.
All of the tables except one used the following format:
2018-01-15Whereas the one table used the following format:
2018-01-15 10:30:45.000The difference is that one table stores date and time, not just date. How does that matter? When the job step executed against that table it got stuck in a loop. The code was written to grab 500,000 records based on the date and TIME 13 months ago. With every iteration, GETDATE() would return a slightly later time, the cutoff would shift forward by milliseconds, and the batch would never fully drain — the job would run forever (well, until midnight anyway, when the date rolled over and the loop finally resolved itself).
Disclaimer: it took me about a week to finally realize what was happening.
The Fix
The root cause is a non-sargable comparison. Wrapping a column in CONVERT(VARCHAR(10), column_timestamp, 110) prevents SQL Server from using an index on that column — the engine has to evaluate the function for every row before it can compare. The correct fix is to cast both sides to DATE, which is sargable and index-friendly:
DELETE TOP (500000) FROM database.dbo.tablename WHERE CAST(column_timestamp AS DATE) <= CAST(DATEADD(MONTH, -13, GETDATE()) AS DATE)OPTION (MAXDOP 1);Casting to DATE strips the time component from both the column value and the cutoff, so every record from the target day is captured in a single pass. The loop drains cleanly, the job finishes, and nobody gets a 1 AM alert.
Key Takeaways
- Date vs. datetime matters for delete loops. If your cutoff includes a time component, each iteration shifts the target window and the batch may never catch up. Cast to
DATEon both sides. - Non-sargable WHERE clauses hurt more than readability. Wrapping a column in a function like
CONVERT()blocks index seeks. UseCAST(column AS DATE)— or better yet, cast both sides as shown above. MAXDOP 1is a useful safety valve for maintenance jobs. Batch deletes running in parallel can hammer CPU; pinning them to a single thread keeps the impact predictable.- Batch your large deletes. A single unbounded
DELETEon a massive table will balloon tempdb and potentially block everything else. Looping in chunks of 500k (tuned to your environment) is the safer pattern.
I probably should have caught the datetime mismatch on day one. I didn’t. But I can guarantee I’ll check the column data type before writing the next delete loop — and now so can you.