Friday, September 12, 2014

Creative Date Filters to Meet Business Needs

Recently, I was tasked with providing an ad hoc query to a group of users. These users wanted this set of data to be delivered to them every year on July 1st. I co-opted our Jenkins box to meet this demand. If you unaware, Jenkins is a tool that assists you in deployment automation and monitoring. Learn more about Jenkins, I'll wait.

Okay, so now that you are familiar with this tool, you might realize that I'm using it to fill a need that it wasn't particularly designed to satisfy, but it is surprisingly useful at filling this need (and the precedent was set by others so if they didn't want me to use it for this, they shouldn't have opened that can of worms in the first place). But, I didn't come here to write about Jenkins, at least, not primarily. I wanted to tell you how to use SQL to manage date parameters in a oddly repetitive task.

So, initially the users wanted this data once a year so it was a simple thing to calculate the date range. Since we would be running this job on the first of July and the cutoff date would be midnight, we easily get that value and since the start date needs to be July 1st of last year, the date parameters are easy to achieve. Let's figure out the end date:

select trunc(sysdate) as end_date from dual;

Pretty damn simple right. You may remember that when the Trunc function is called on a date without any other parameters, it simply drops the time component and sets that to 00:00:00. Now, let's figure out the start date:

select add_months(trunc(sysdate),-12) as start_date from dual;

This is still not very complicated, really. You might remember us talking about the add_months function in the past. To recap, add_months does exactly what it sounds like, it adds months to the date that is passed and returns a date value. So, lets says sysdate is July 1st, 2014 10:07:35 AM when we run these two statements we get July 1st, 2013 00:00:00 and July 1st 2014 00:00:00. Problem solved, right?

Wrong. As users so often do, they changed the scope on this. In total, they want this report delivered 4 times per year. Here's the breakdown:

Fiscal Year Preview Report: Jul 1, LY (Last Year) - Jun 1, CY (Current Year) delivered on Jun 1, CY
Fiscal Year Full Report: Jul 1, Last Year - Jul 1, CY delivered on Jul 1, CY
Service Year Preview Report: Oct 1, LY - Sep 1, CY delivered on Sep 1, CY
Service Year Full Report: Oct 1, LY - Oct 1, CY delivered on Oct 1, CY

It looks a little sketchy. The add months won't work all of the time now because two of the reports are for 11 month periods but two are for 12 month periods. I should probably mention that the end dates are exclusive so the query will be greater than or equal to start date and less than end date. Well, I could probably change the number I pass to add months but there's no guarantee that they won't push back and change the schedule even further.

As a measure to allow for future expansion, I decided to use a series of sub-selects to get to what I wanted. Let's continue with the previous examples but try using sub-selects to get both of the values:

select add_months(end_date, -12) as start_date from (
    select trunc(sysdate) as end_date from dual);

This highlights that we are using the current day as the method to get the date to start last year. But, what happens if there are issues on the first and the report doesn't get executed? If the email server tips over and the report doesn't get delivered, the Jenkins work space will still have the file so we can export it and manually send. But if the Jenkins job doesn't run to create the file on the 1st and we run it on the second, what happens? Yep, that's right! Give yourself a cookie. The parameters will skew one day into the future and our report loses its integrity. The users will not be happy that the report generated on the 2nd grabbed the records from the 1st in the wrong year. Bogus! What do? Never fear. I got you.
(BTW, I wish I could say that the above concerns are fatalistic in nature, but every place I've ever worked has had brittle systems and shit happened - servers have tipped over, email has gone down, all hell has broken loose - we need to be aware of that and give our code as much bulletproofing as we can.)

I decided to write the queries such that they could be ran on any day of the month and they would calculate the appropriate days. If so much shit happens that we can't get this report ran and delivered within a month, we've got way bigger fish that we need to be frying. And resumes to be updating. So, here goes, let's get creative!

We need to get the current month, the current year, determine which month we are in, and calculate the appropriate start and end dates.

First, let's get the numeric value for current month and year.

select
    to_number(to_char(sysdate, 'mm')) as currmo,
    to_number(to_char(sysdate, 'yyyy')) as curryear
  from dual;

Muy bueno! Second, let's determine what the start month should be based on current day, get the start day (in this case this is the first but if they suddenly decide that this should run from the 15th to the 15th we just change one spot in the query), and select the current month and year variables like above:

select
    case
        when currmo in (6,7) then 'J'
        when currmo in (9,10) then 'O'
        else 'N'
    end startflag, -- 'J' means start july 1, 'O' means start oct 1, 'N' means use sysdate,
    currmo,
    curryear,
    to_char(curryear - 1) lastyear,
    '01' startday
  from (
    select
        to_number(to_char(sysdate, 'mm')) as currmo,
        to_number(to_char(sysdate, 'yyyy')) as curryear
      from dual);

Alright, that looks good. We should have enough here to calculate the start date and end date appropriately for our given criteria and if they schedule changes a bit, we should have done this in such a way that we'll be able to quickly update it to what we need. Let's put this altogether now.

Let's calculate the appropriate start date and end date.

select
    case
        when startflag = 'J' then to_date(startday || '-JUL-' || lastyear, 'dd-mon-yyyy')
        when startflag = 'O' then to_date(startday || '-OCT-' || lastyear, 'dd-mon-yyyy')
        when startflag = 'N' then trunc(sysdate)
    end startdate,
    to_date(startday||lpad(currmo, 2, '0')||curryear, 'ddmmyyyy') enddate
  from (
    select
        case
            when currmo in (6,7) then 'J'
            when currmo in (9,10) then 'O'
            else 'N'
        end startflag, -- 'J' means start july 1, 'O' means start oct 1, 'N' means use sysdate,
        currmo,
        curryear,
        to_char(curryear - 1) lastyear,
        '01' startday
      from (
        select
            to_number(to_char(sysdate, 'mm')) as currmo,
            to_number(to_char(sysdate, 'yyyy')) as curryear
          from dual));

This works beautifully! If we run this on any day in September, if figures out that the start date should be October 1st, LY, etc. To test this out, you can set aside four months and run this every day for those four months, or you can change your system date and rerun, but what I did was change which numbers were in the IN clauses in the start flag CASE statement. This will nicely simulate the different dates and you don't have to waste four months or change your system date.

Now you might be thinking, this sounds good and all but how do we utilize these dates in our report query. Good point. Let's consider an example where we have a table of transactions and they have a column called TRANS_DATE and that is going to be what we compare against our values above. Let's image that the users want a report of everything in the TRANS table within those date parameters (the actual report is a bit more complicated than this and I don't want to muddy the waters any further than they already are). Our report query will look like:

select
    *
  from trans t
 inner join (
  select
    case
        when startflag = 'J' then to_date(startday || '-JUL-' || lastyear, 'dd-mon-yyyy')
        when startflag = 'O' then to_date(startday || '-OCT-' || lastyear, 'dd-mon-yyyy')
        when startflag = 'N' then trunc(sysdate)
    end startdate,
    to_date(startday||lpad(currmo, 2, '0')||curryear, 'ddmmyyyy') enddate
  from (
    select
        case
            when currmo in (6,7) then 'J'
            when currmo in (9,10) then 'O'
            else 'N'
        end startflag, -- 'J' means start july 1, 'O' means start oct 1, 'N' means use sysdate,
        currmo,
        curryear,
        to_char(curryear - 1) lastyear,
        '01' startday
      from (
        select
            to_number(to_char(sysdate, 'mm')) as currmo,
            to_number(to_char(sysdate, 'yyyy')) as curryear
          from dual))
  ) datefilter
on 1=1
where t.trans_date >= (datefilter.start_date)
    and t.trans_date <   (datefilter.end_date)
order by 1;

Alright, there you have it. There are other ways we could have handled this, for instance, we could have change the add months, used trunc (sysdate, 'MM') to get to the first day of the month but I like what I have here. It seems to be easily expandable. If they decide that they want to do this quarterly, all we have to do is add those quarters to the start flag case statement and add the cases to handle those new start flag values. It shouldn't be heavy lifting to make those changes.

So endeth the lesson. Give this a shot of you feel like it or the need arises. Extra credit if you take the time to figure out another way to do this.

Until next time, Happy Querying.

No comments:

Post a Comment