I had previously blogged about working on a SQL job that deleted data older than 13 months. Today I thought I would follow up with the database and check to see how much space was available in the database. So I ran a disk usage report out of SSMS and got the following result:

difference in space

Wait, what?

I was looking at the report and trying to decipher what the difference is between unused space and unallocated space. It’s not very intuitive is it? My OCD was going to let this go, so I started Googling. What I found was a lot of confusion over the matter until I found this forum post on MSDN.

Essentially the difference between unused and unallocated is this:

Unallocated – free space in side the data file. (what we think of)

Unused – empty space/extents that are reserved for an object.

I couldn’t understand how I would have empty space in an object after I had removed data. Then it dawned on me, it is most likely space reserved in the indexes that will be released when indexes are rebuilt.

Ok all and all, not that great of a revelation, but when you’re looking at the report it doesn’t make a lot of sense at a glance. Hey Microsoft, some kind of pie chart legend would be good here. Just saying.

I have recently been working on tuning a job that purges data out of various tables that are older than 13 months using getdate. Pretty straight forward right? Well originally the job consisted of 6 different statements like so:

delete from table1
   where column_with_date <= DATEADD(mm,-13,GETDATE());

Except, the tables have become so large the generic deletes were filling up tempdb. No bueno. So I decided to rewrite the job, using six separate stored procedures that I created. While redundant, we could modify the behavior for each delete in the table independently. More importantly, we could batch out the deletes preventing tempdb from filing.

So I went about creating a generic template to run a delete in batches. I think I based the code off a script I found on DBA Stack Exchange.

SET NOCOUNT ON;

    DECLARE @rowcount INT;
    SET @rowcount = 1;

    WHILE @rowcount > 0
    BEGIN
        BEGIN TRANSACTION;

        DELETE TOP (500000)
            database.dbo.table(x)
        WHERE column <= DATEADD(mm,-13,GETDATE())
        OPTION (MAXDOP 1);

        SET @rowcount = @@ROWCOUNT;

        COMMIT TRANSACTION;

    END

One of the stored procs began using parallelism and causing high CPU usage, so I added the maxdop option.  All of the tables have a timestamp column, so the getdate() function works great to use as a filter. Except not all of the timestamp columns were formatted the same way.

All of the tables except one, used the following format:

Where as one table used the following format:

The Fix

The difference is the one table uses date and time, not just date. How is this important? Well, when the one job step executed on this table it got stuck in kind of a loop. The code was written to grab 500k records based on the the date and TIME 13 months ago. So with every iteration, the time would change ever so slightly and the job would run forever (well, until midnight anyway).

Disclaimer, it took me about a week to finally realize what was happening. So to correct the behavior, I changed the formatting of the where clause so the query would get all of the records for the day instead of the slow cycle it was on.

SELECT * FROM table(x)
    WHERE CONVERT(VARCHAR(10), column_timestamp, 110) <= DATEADD(mm,-13,GETDATE());

I feel pretty dumb for not catching this sooner. Although, I doubt it ever takes me this long to catch it again.