DBDeveloper: 2014

Friday, September 12, 2014

Creative Date Filters to Meet Business Needs

Recently, I was tasked with providing an ad hoc query to a group of users. These users wanted this set of data to be delivered to them every year on July 1st. I co-opted our Jenkins box to meet this demand. If you unaware, Jenkins is a tool that assists you in deployment automation and monitoring. Learn more about Jenkins, I'll wait.

Okay, so now that you are familiar with this tool, you might realize that I'm using it to fill a need that it wasn't particularly designed to satisfy, but it is surprisingly useful at filling this need (and the precedent was set by others so if they didn't want me to use it for this, they shouldn't have opened that can of worms in the first place). But, I didn't come here to write about Jenkins, at least, not primarily. I wanted to tell you how to use SQL to manage date parameters in a oddly repetitive task.

So, initially the users wanted this data once a year so it was a simple thing to calculate the date range. Since we would be running this job on the first of July and the cutoff date would be midnight, we easily get that value and since the start date needs to be July 1st of last year, the date parameters are easy to achieve. Let's figure out the end date:

select trunc(sysdate) as end_date from dual;

Pretty damn simple right. You may remember that when the Trunc function is called on a date without any other parameters, it simply drops the time component and sets that to 00:00:00. Now, let's figure out the start date:

select add_months(trunc(sysdate),-12) as start_date from dual;

This is still not very complicated, really. You might remember us talking about the add_months function in the past. To recap, add_months does exactly what it sounds like, it adds months to the date that is passed and returns a date value. So, lets says sysdate is July 1st, 2014 10:07:35 AM when we run these two statements we get July 1st, 2013 00:00:00 and July 1st 2014 00:00:00. Problem solved, right?

Wrong. As users so often do, they changed the scope on this. In total, they want this report delivered 4 times per year. Here's the breakdown:

Fiscal Year Preview Report: Jul 1, LY (Last Year) - Jun 1, CY (Current Year) delivered on Jun 1, CY
Fiscal Year Full Report: Jul 1, Last Year - Jul 1, CY delivered on Jul 1, CY
Service Year Preview Report: Oct 1, LY - Sep 1, CY delivered on Sep 1, CY
Service Year Full Report: Oct 1, LY - Oct 1, CY delivered on Oct 1, CY

It looks a little sketchy. The add months won't work all of the time now because two of the reports are for 11 month periods but two are for 12 month periods. I should probably mention that the end dates are exclusive so the query will be greater than or equal to start date and less than end date. Well, I could probably change the number I pass to add months but there's no guarantee that they won't push back and change the schedule even further.

As a measure to allow for future expansion, I decided to use a series of sub-selects to get to what I wanted. Let's continue with the previous examples but try using sub-selects to get both of the values:

select add_months(end_date, -12) as start_date from (
select trunc(sysdate) as end_date from dual);

This highlights that we are using the current day as the method to get the date to start last year. But, what happens if there are issues on the first and the report doesn't get executed? If the email server tips over and the report doesn't get delivered, the Jenkins work space will still have the file so we can export it and manually send. But if the Jenkins job doesn't run to create the file on the 1st and we run it on the second, what happens? Yep, that's right! Give yourself a cookie. The parameters will skew one day into the future and our report loses its integrity. The users will not be happy that the report generated on the 2nd grabbed the records from the 1st in the wrong year. Bogus! What do? Never fear. I got you.
(BTW, I wish I could say that the above concerns are fatalistic in nature, but every place I've ever worked has had brittle systems and shit happened - servers have tipped over, email has gone down, all hell has broken loose - we need to be aware of that and give our code as much bulletproofing as we can.)

I decided to write the queries such that they could be ran on any day of the month and they would calculate the appropriate days. If so much shit happens that we can't get this report ran and delivered within a month, we've got way bigger fish that we need to be frying. And resumes to be updating. So, here goes, let's get creative!

We need to get the current month, the current year, determine which month we are in, and calculate the appropriate start and end dates.

First, let's get the numeric value for current month and year.

select
to_number(to_char(sysdate, 'mm')) as currmo,
to_number(to_char(sysdate, 'yyyy')) as curryear
from dual;

Muy bueno! Second, let's determine what the start month should be based on current day, get the start day (in this case this is the first but if they suddenly decide that this should run from the 15th to the 15th we just change one spot in the query), and select the current month and year variables like above:

select
case
when currmo in (6,7) then 'J'
when currmo in (9,10) then 'O'
else 'N'
end startflag, -- 'J' means start july 1, 'O' means start oct 1, 'N' means use sysdate,
currmo,
curryear,
to_char(curryear - 1) lastyear,
'01' startday
from (
select
to_number(to_char(sysdate, 'mm')) as currmo,
to_number(to_char(sysdate, 'yyyy')) as curryear
from dual);

Alright, that looks good. We should have enough here to calculate the start date and end date appropriately for our given criteria and if they schedule changes a bit, we should have done this in such a way that we'll be able to quickly update it to what we need. Let's put this altogether now.

Let's calculate the appropriate start date and end date.

select
case
when startflag = 'J' then to_date(startday || '-JUL-' || lastyear, 'dd-mon-yyyy')
when startflag = 'O' then to_date(startday || '-OCT-' || lastyear, 'dd-mon-yyyy')
when startflag = 'N' then trunc(sysdate)
end startdate,
to_date(startday||lpad(currmo, 2, '0')||curryear, 'ddmmyyyy') enddate
from (
select
case
when currmo in (6,7) then 'J'
when currmo in (9,10) then 'O'
else 'N'
end startflag, -- 'J' means start july 1, 'O' means start oct 1, 'N' means use sysdate,
currmo,
curryear,
to_char(curryear - 1) lastyear,
'01' startday
from (
select
to_number(to_char(sysdate, 'mm')) as currmo,
to_number(to_char(sysdate, 'yyyy')) as curryear
from dual));

This works beautifully! If we run this on any day in September, if figures out that the start date should be October 1st, LY, etc. To test this out, you can set aside four months and run this every day for those four months, or you can change your system date and rerun, but what I did was change which numbers were in the IN clauses in the start flag CASE statement. This will nicely simulate the different dates and you don't have to waste four months or change your system date.

Now you might be thinking, this sounds good and all but how do we utilize these dates in our report query. Good point. Let's consider an example where we have a table of transactions and they have a column called TRANS_DATE and that is going to be what we compare against our values above. Let's image that the users want a report of everything in the TRANS table within those date parameters (the actual report is a bit more complicated than this and I don't want to muddy the waters any further than they already are). Our report query will look like:

select
*
from trans t
inner join (
select
case
when startflag = 'J' then to_date(startday || '-JUL-' || lastyear, 'dd-mon-yyyy')
when startflag = 'O' then to_date(startday || '-OCT-' || lastyear, 'dd-mon-yyyy')
when startflag = 'N' then trunc(sysdate)
end startdate,
to_date(startday||lpad(currmo, 2, '0')||curryear, 'ddmmyyyy') enddate
from (
select
case
when currmo in (6,7) then 'J'
when currmo in (9,10) then 'O'
else 'N'
end startflag, -- 'J' means start july 1, 'O' means start oct 1, 'N' means use sysdate,
currmo,
curryear,
to_char(curryear - 1) lastyear,
'01' startday
from (
select
to_number(to_char(sysdate, 'mm')) as currmo,
to_number(to_char(sysdate, 'yyyy')) as curryear
from dual))
) datefilter
on 1=1
where t.trans_date >= (datefilter.start_date)
and t.trans_date < (datefilter.end_date)
order by 1;

Alright, there you have it. There are other ways we could have handled this, for instance, we could have change the add months, used trunc (sysdate, 'MM') to get to the first day of the month but I like what I have here. It seems to be easily expandable. If they decide that they want to do this quarterly, all we have to do is add those quarters to the start flag case statement and add the cases to handle those new start flag values. It shouldn't be heavy lifting to make those changes.

So endeth the lesson. Give this a shot of you feel like it or the need arises. Extra credit if you take the time to figure out another way to do this.

Until next time, Happy Querying.

Monday, July 21, 2014

A problem with Decode and a problem with people

A problem with Decode

I ran into an interesting behavior of decode today. The security team wanted a report to verify that data retention jobs were working properly. I wrote a query to find the oldest overall record, the oldest record still retaining the data the retention jobs would erase, and the counts of each. Well, the data looked pretty good at first glance and it seemed accurate when it highlighted a few instances where the retention jobs weren't scheduled correctly. However, one date kept surfacing over and over. I didn't think much of it, but the security team wanted an explanation. So, I dug in and they were right. The resurfacing date didn't match the data. My query wasn't correct. But why?

Well, first off, I'm a bit of a an old dog. I've been at this a while and I might be a tad stuck in my ways. Because of that, I have habits and one of those habits is relying on the decode function and eschewing the case function for most uses. This time it bit me. My decode looked something like this:

min(decode(retention_column, null, add_months(sysdate, 1), transaction_date))

So, if the retention column is null, use a dummy date, else consider the date of the transaction and once all that is done, get the earliest one. Seems relatively straightforward, right? Yet, the results didn't match the data. I just wasn't finding the issue so I called in a new set of eyes. He confirmed that I wasn't crazy and that it looked like it should work. He dismantled the statement a little more and tried some changes to find out what was going on. So, it turns out that the Decode statement automatically converts parameters to string values. Then the min was happening on the string, not the date. That is why dates of the first were coming before the real minimum date. Problem solved. So, what I had to do is convert the results of the decode function to date and then find the minimum. It looks like this:

min(to_date(decode(retention_column, null, add_months(sysdate, 1), transaction_date), 'dd-mon-yyyy')

A problem with people

So, I really just need to vent about something. There was a bit of a fiasco at work a couple weeks ago. Someone was tasked with clearing a test database and ended up being in the wrong one when they ran their scripts. I found out about it a few days after it happened and I was tasked with adding a few users in so that they could get in the system and do what they needed to do. Someone also mentioned that the latest backup was 6 months old. It only took me a couple of minutes to add in the users.

I took some time and figured out a way to pick a random number of customers, a random number of days and a random number of transactions for those days. I was able to generate 6 months worth of test transactions and effectively replace the data that had been lost. I've been super excited about this and telling everyone that I run into. However, I haven't heard jack shit from the people directly involved with this system. They couldn't give a shit less about my breakthrough.

Now, there is a general malaise of apathy that has infected probably 90% of the people at my work. Maybe that is a little harsh. It is probably more like 60% apathy and 50% incompetence. A few of you math wizards out there might have figured out that that covers more than the sum total of people. And I say nay. Those aren't mutually exclusive. There's 15 to 20% of people that both don't know shit and don't give a fuck about it.

It is difficult to keep a head of steam up when beset with the lazy and the stupid on almost all sides. Probably more than half of the few competent people that are left have one foot out the door. The others are bravely trying to stem the tide of meh, with varying success. I just don't get how people do not see the potential in the randomized data generator I created. In my mind's eye, I picture a system that can user docker and puppet to spin up a new test instance, kick off my script and simulate half a year's worth of activity all in a matter of moments. So, you are showing the application off to a new set of customers? How about a personalized demo in a personalized test environment spun up at a moment's notice? Maybe I should lay off the paint thinner. Or maybe I'm onto something here. Hell, I've had several of the brighter stars at work singing the praises of docker and puppet for months now. One of my illustrious colleagues just accepted an offer to go work for PuppetLabs in Portland. In a world where a foul-mouthed, halfwit database developer can see massive potential in this stuff, why can't everyone?

Alright, I get it, my feelings got hurt. For a guy who looks like I look: Sons of Anarchy meets Trailer Park Boys meets Cops meets Lizard Lick Towing, I can be rather sensitive. But for realz, maybe try giving a shit. Let's just start with one shit given per week. Baby steps, alright? Rome wasn't built in a day. Or perhaps I'm just pissy because my work wasn't appreciated enough. Perhaps I'm being too hard on everyone and I'm lashing out in anger and painting with too wide and negative of a brush. Perhaps everyone is so in awe of my randomizer that they've been struck dumb and are screaming praise silently in their own heads. Or perhaps not. Perhaps, the "meh, good enough" crowd really are fucking it up for the people that are on the streets.

Either way, I feel properly vented. I'm going to bed. See you all next time. In the meantime, be careful when using the Decode function and be extra careful when pissing excellence. The mediocre and unenlightened might just treat it as regular piss. Toodles.

Sunday, June 15, 2014

Schema looping in PL/SQL

I was asked by a co-worker to look at a task that had been given to her to see if I could write some SQL to make it easier. A lot of the one-off tasks around here can be simple but very time consuming. And it just so happens that I like writing SQL so look forward to these little diversions. The task was to add a property key/value pair for each record that matched a certain set of criteria in every database instance that we have for a certain application we support. We have three financial database service names in Oracle at the moment that divide the country up mostly by geographic location. We'll call these East, West and Yeehaw. I needed a block of code that could be executed once on each of these services but update each of the schemas housed on these service names that needed the new property.

I had done some recent investigation regarding aliasing a schema name and found that Oracle doesn't directly support this. My bright idea was to use the data dictionary and see if I could work something out. I wrote a quick block to loop through a cursor and print out the schema names that I would be interested in that exist on that service name. That block looks something like this:

Declare

CURSOR schemacur
IS
SELECT owner
FROM all_tables
WHERE owner LIKE '%MYAPP%'
AND table_name = 'PROPERTIES'
ORDER BY 1;
schemarec schemacur%rowtype;
schemaval VARCHAR2(64);

begin

open schemacur;
loop
fetch schemacur into schemarec;
exit when schemacur%notfound;

schemaval := schemarec.owner;
dbms_output.put_line(schemaval);

end loop;
close schemacur;

end;
Refresher:
A cursor is basically an array filled up by the results of a select statement. Generally, you interact with a cursor by opening it, fetching it into a record, and eventually closing it. A PL/SQL loop is similar to any other loop in any other language. The exit statement is how you get out of the loop.

In this case, I have a variable declared as a record that matches the ROWTYPE from the the cursor. In this particular case, I could have fetched it directly into a varchar2 variable since the select statement only gets one column. However, I wanted to show the more complex structure. First, I open the cursor. This executes the select statement for the cursor and holds the results in memory with the index on the first record. The index is implied here. I go into a loop and fetch the first record into my record variable and specify that when no more records are found that I should exit the loop. I assign the record value to the variable and print it out in the dbms output window. Of course, you need to turn on the dbms output window within sqldeveloper or whatever tool you are using. GUI tools will have a button to click or some such thing, while sql*plus has the SET SERVEROUTPUT ON; command.

So, the output in the window of this is an alphabetically sorted list of schemas that exist on the current service name that are like '%MYAPP%' and own a table called 'PROPERTIES';

Now, I can tweak this block a little by adding another cursor and loop to check records within each of those schemas. What we have now is:

Declare

CURSOR schemacur
IS
SELECT owner
FROM all_tables
WHERE owner LIKE '%MYAPP%'
AND table_name = 'PROPERTIES'
ORDER BY 1;

schemarec schemacur%rowtype;
schemaval VARCHAR2(64);
schemacount INTEGER;
i INTEGER;

CURSOR proccur
IS
SELECT processor_id
FROM processor
WHERE description LIKE '%STYLE1%'
AND description LIKE '%STYLE2%';

procrec proccur%rowtype;
procid NUMBER;
proccount INTEGER;
j INTEGER;
prockey CONSTANT VARCHAR2(16) := 'KEY';
procval CONSTANT VARCHAR2(64) := 'VALUE';

sqlstmt VARCHAR2(4000);

BEGIN

i := 0;
select count(*) into schemacount from all_tables where owner LIKE '%MYAPP%'
AND table_name = 'PROPERTIES';
dbms_output.put_line(schemacount || ' schemas to loop through');

open schemacur;
loop
fetch schemacur into schemarec;
exit when schemacur%notfound;
i := i + 1;
schemaval := schemarec.owner;
--dbms_output.put_line(schemaval);
sqlstmt := 'alter session set current_schema=' || schemaval;
EXECUTE immediate sqlstmt;

select count(*) into proccount from processor
WHERE description LIKE '%STYLE1%'
AND description LIKE '%STYLE2%';
dbms_output.put_line(proccount || ' processors to loop through');
j := 0;

OPEN proccur;
LOOP
FETCH proccur INTO procrec;
EXIT
WHEN proccur%notfound;
j := j + 1;
procid := procrec.processor_id;
dbms_output.put_line(procid);

END LOOP; -- proccur
dbms_output.put_line('looped through ' || j || ' processors');

CLOSE proccur;
END LOOP; -- schemacur
dbms_output.put_line('looped through ' || i || ' schemas');

CLOSE schemacur;
sqlstmt := 'alter session set current_schema=myschema';
EXECUTE immediate sqlstmt;

EXCEPTION
WHEN OTHERS THEN

raise;
END;

Now, this one adds a lot more meat but in essence just adds another loop and a few more display items. I wanted to confirm that I looped through the expected number of records for both cursors so I get the count of the query and count each iteration and visually compare results. This also serves to show me how the schema redirect works. In the first block, I just output the schema names. In this block, I alter my session and change the current schema to each of these targets in turn and subsequently execute the proccur loop as if I were connected as those schemas. Well, not exactly - Oracle interprets the lack of a schema alias as "Oh, they must mean this object is owned by the current schema, let me check what that is" and I repointed the current schema to a new one over and over and at the very end I set it back.

Now, all I need to do is to insert my new property instead of print each id in the inner cursor loop. This worked like a charm. I executed the block 3 times on each of the aforementioned service names and every affected schema received the new property and I saved my coworker a lot of time and work. It felt really nice.

Well, that does it for this installment. See you all next time around.

Sunday, June 8, 2014

Oracle - you so crazy!

One of the tasks I was given recently was to check the credit card retention data and see how the data looks and I discovered a very interesting thing with the Case statement and NULL in Oracle. My initial query seemed to indicate that the retention job was not running. Here's the statement I ran:

select case card_number when null then 'NULL' else 'NOT NULL' end as card, count(*)
from credit_card
group by case card_number when null then 'NULL' else 'NOT NULL' end
order by 1;

This looks pretty straightforward. The case is on the card_number when the value is null change the output to the string 'NULL' else change the output to 'NOT NULL' and count the occurrences of each.
Every time I ran this in every database, it showed all records as 'NOT NULL' which shouldn't be correct. I circled back and changed the case to a decode to see what it would look like. That statement was:

select decode(card_number,null,'NULL','NOT NULL') as card, count(*)
from credit_card
group by decode(card_number,null,'NULL','NOT NULL')
order by 1;

The logic is the same but we use an oracle standard decode function instead of the CASE keyword. And our results look more like we'd expect. Some are null and some are not null. So, what went wrong with the CASE statement? Some of you might have already picked up on what I did wrong. The null in the ...Case card_number when null... part was not checking if card_number was null, it was saying when there is nothing to do change the variable to 'NULL'. If we look to the NULL statement in PL/SQL that might help us understand.

create or replace procedure validnullproc as
Begin
Null;
End;

This is a valid procedure that does fuck all. So basically I was inserting a null chunk of code into my Case statement. Here is a statement that shows a couple of other options:

SELECT
CASE card_number
WHEN NULL
THEN 'NULL'
ELSE 'NOT NULL'
END AS card,
DECODE(card_number,NULL,'NULL','NOT NULL') as decodecard,
nvl2(card_number, 'NOT NULL', 'NULL') as nvl2card
FROM credit_card
WHERE card_number IS NULL
AND rownum < 11
UNION
SELECT
CASE card_number
WHEN NULL
THEN 'NULL'
ELSE 'NOT NULL'
END AS card,
DECODE(card_number,NULL,'NULL','NOT NULL'),
nvl2(card_number, 'NOT NULL', 'NULL')
FROM credit_card
WHERE card_number IS NOT NULL
AND rownum < 11;

Results:
"CARD" "DECODECARD""NVL2CARD"
"NOT NULL" "NOT NULL" "NOT NULL"
"NOT NULL" "NULL" "NULL"

So, my fancy Case statement to pull the CARD column above is basically just saying always Print 'NOT NULL'. However, both the DECODE and the NVL2 functions resolve to the correct values and I think NVL2 is the better option in this case. And I must admit that NVL2 is new to me, personally.

In my investigations into the odd behavior of the CASE and NULL statement, I did uncover this lovely new function called NVL2. It is basically DECODE for NULLs. Most of you are probably familiar with the standard NVL function. You might have seen something like NVL(status_code, 'X') which will resolve to the status_code value unless it is NULL in which case it replaces the NULL with 'X'. You can do it with DECODE but it is more typing: DECODE(status_code, null, 'X', status_code). DECODE can come in real handy when you are displaying coded data, especially in older somewhat denormalized or poorly designed enterprise applications. Say you have a status column on some table that stores single letter status codes but you did not store the corresponding word values in a reference table. You can use DECODE in your report SQL to do this for you: DECODE(status, 'A', 'Active', 'I', 'Inactive', 'C', 'Closed', 'W', 'Whiskey', T', 'Tarngo', 'F', 'Foxtrot'). As you can see from the above example, you can handle NULL/NOT NULL with decode as well but NVL2 is designed specifically to allow you to change the value of not null data. As you can probably tell from the above example, the syntax is the function name NVL2 and the first parameter is the field or value you are changing, the second parameter is the value you want to output if the first parameter is not null, and the final parameter is the value you want to output if the first parameter is null.

Well, thanks for tuning into my adventure's with Oracle. Now, go forth and use Oracle wisely.

Wednesday, April 30, 2014

April foolishness

So, I've been working on Unit Tests in my schema. I have copied over some test data and built out the structure and written some code to setup and tear down as I have discussed previously. Well, as often happens at work, I had some shifting priorities and the Unit Tests fell to the back burner for a bit. Well, I was able to get back to them in full force today. The problem I ran into after several hours of writing code, running tests, etc. is that I had totally jacked up a couple of sequences. Sequences are number generators used for primary keys in tables. If your table is TAB and your primary key is TAB_ID then usually you will have a sequence called either SEQ_TAB_ID or TAB_ID_SEQ depending on your naming convention. My Unit Tests use the same method that the App does to enter records. In this case it is a shit ton of stored procedures. Almost every table, but not every table because that would be fucking stupid, has a procedure to insert a row and another procedure to delete a row. I imagine that the App does not use the delete procedures much or at all. But, I sure can use them to clean up after myself. Curtis knows what I'm talking about. Lulz.

Anyway, since this is all in my schema and I pulled some real test data, but not all and recreated the objects and what not, I did a silly thing. Now a normal well adjusted developer might have come up with a cohesive plan to do this, but I was feeling frisky and decided to hit the ground running so to speak. I jumped right in there and pulled over the data, then I manually created a couple of the sequences that I need for the testing. Here is the appropriate way to create the above mentioned sequence in a pristine environment:
CREATE SEQUENCE SEQ_TAB_ID START WITH 1 INCREMENT BY 1;
There are some other options but nobody gives a fuck about them so I'm moving on. These two are a must. You tell the sequence its first value and you tell it how many to add to the first value to get the the next value. This was a crucial mistake. What I should have done was find out the max id in the data that I had pulled over and given the sequences an incremented value of that to start with. So, let's say that the maximum value of my TAB_ID is 698226. I may have pulled over the last month's worth of test transactions to play with or something like that. But, my sequence started at 1. So, the first few tests worked because those values were available in the table. (A big aside here: we have talked about this in the past, I believe and you may already know this but, a primary key in a table if it has a primary key constraint must be unique. So, what happened was I pulled over records starting at 1000 to 600,000 say and I set my sequence to start at 1. As soon as the sequence has been run 999 times, I'm going to have a bad time. That is what happened. I began to get unique constraint errors. Well, I figured out my mistake but I again failed to act appropriately. I'm not sure if I'm just a big giant fucktard or if I've been around them too much but either way, I need to step my fucking game up. If sequences were blocking a shot, I'd be that Ware kid.

An appropriate action at this point would have been to drop the existing sequence and recreate it as I should have done in the beginning. Ah, fuck that. I'm a bro-grammer, right? I do this shit my way. I'm from the streets, bitch. Fuck you and your 'industry standards'. I'll have none of it. I'm keeping it real. My solution might be a legitimate move if you are working in a database that might be getting live action. You know, testing in MOTHERFUCKING PROD as we so often do. Well, I bumped up the increment by to 13000 and I selected some motherfucking nextvals. Soon, enough I got to where I needed to be. And all is right with the world. Got my mind on my DB and my DB on my mind, yo. Not really, I had getting the fuck up out of here on my mind. I had shit to do. I bolted and left it the way it was. That's right...Left. It. The. Way. It. Was. So, the next day, I didn't even get logged in before my priorities changed yet again. Shit breaks. Other shit comes up from time to time and the shit just gets stirred around. It's like a big shitty pot of shitty spaghetti sauce that you have stir in a shitty manner with your shitty little wooden spoon every few shitty minutes. It's - well, it's shitty, I guess, is what I'm trying to say. I worked unhappily on my new priority for a week or more. Shit didn't go well. (Is anyone else sensing a theme here?) I eventually got that priority as far along as I could get it and got back to this one again. So, today, I'm testing merrily away. I'm like the little Unit Tester that could. I got this. Sadly, no. I got shit. (There it is again - it's like Bad Wolf but shittier.) I ran several tests and wrote some code and ran some tests and wrote some code. Well, I eventually added a new insert and when I tested that one, it shit all over the place. TOAD was like the bathroom in Trainspotting. Ick. After a bit more than a bit of investigation, I'd say it was a squared bit, I discovered that my recently added table had the TAB_ID declared as a NUMBER(9,0); which means the largest available TAB_ID could be 999999999 whereas my main TAB table had the TAB_ID declared as NUMBER; which means there is no upper bound (technically there is but let's not get all tangential) so I had apparently been testing this thing like a MOFO because my MAX(TAB_ID) was 389548277499. My sequence had been adding 13000 to the id every time a test was ran. Did I mention that I test several records during each test. Why, that is a test within a test. Yo Dawg. I heard you like tests...It is TESTCEPTION. Every test I ran ended up inflating the TAB_ID by 65000.

MFW I realized what I had done. I chuckled a little. If I didn't laugh at myself I'd be all emo and that wouldn't work out well at all. I have no hair to die jet black, can't afford a complete wardrobe overhaul, and if I'm going to be cutting anybody it is going to be other people because fuck them, amiright? I have found that other people are three of the top three things wrong with the world today. Snooki, Mindy Kaling, and Dianne Feinstein are all other people. Thanks, Obama.

Well, I hope my pain may have served to amuse and perhaps enlighten you all a little bit. And to be honest, sometimes off the beaten path is where you want to go. You might find a change of perspective beneficial. Of course, you might find a ton of shit. You know what, I just realized why the nuns at my grade school were such ball-busting bitches. Because the painful lessons are the ones that teach you the most. Of course, those dumb cunts can't make every lesson the most painful of your life and reasonably expect it to work. If all you know is pain, it loses its meaning, its poignancy and its teaching ability.There has to be a little downtime from the punishment. You can only taint-punch a hooker so much before she gets all mouthy...with the 'when you untie me, I'm going to fuck you up', or 'please don't kill me', or my favorite 'you've just lost your family discount cuz'. But, I digress. Again. I digress all over the place. I even got some digress on the curtains once. That was a trip. Literally. I actually tripped over the roll of duct tape, or maybe it was the shovel handle. Either way, I fell into the motel curtains mid digression and ended up with a mild abrasion on the People's Digressor.

Next time or at least coming soon, I'll cover some basic Oracle functions to make life easier. Apparently, I'm a lousy friend and kind of a dick for not doing that sooner. The REPLACE function would have been useful knowledge for someone.

Thursday, February 6, 2014

Yo Dawg, I heard you like writing SQL & Goddamnit, that didn't go as planned.

Today was a roller coaster of a day. I had an interesting success and a stupid yet interesting rookie mistake. It started off my needing to make a change to someone else's script that I have to run during the next deploy.

Interesting Success

As a little background here, we are deploying changes to one of our multitenant applications that relies on one of our old silo'd applications. So we would be generating a few hundred update statements from one database and then applying them to several different databases. So, I would have to either gain access to each of these individual databases and parse out the output of the script and run only the applicable statements OR manually edit the results to add the appropriate schema designation to the resulting scripts, which is tedious and prone to error. Or, I could figure out a way to fix the statement at the time of generation.

Most of my SQL Writing SQL efforts rely upon the data dictionary and its massive library of tables and views to make magic happen. The script that I was given needed to be run in a somewhat alien application but have the resulting SQL ran in several instances of one of the applications I am intimately familiar with (assuming you consider dry anal rape being intimately familiar).

The problem was that I was expecting to be able to pull the OWNER from the data dictionary ALL_TABLES view or some such thing but the script that I was given was just using Application data tables and was not relying on the data dictionary. This threw a wrench into my plans and the heat was getting turned up also because the deploy is this Sunday at the un-FSM-ly hour of 5 AM - did I mention the dry anal rape, yeah. So, in a panic I started investigating and gaining familiarity with the quasi-alien database. Running some select statements and looking very closely at the ones that were being used in the script I was given. The gentleman that provided the script was nice enough to add in a comment to break up the statements into groups based on their location such as --Portal1. However, I need the schema designation for Portal1 not the words Portal1. So, during my investigation I found that the table that holds the Portal Name Portal1, etc. also holds a URL for the API. Sure enough, the schema designation was mostly included in the URL.

For example: Portal1's URL would be http://foo-p1.xyz.com/blah/blah/woof/woof... and the schema designation would be foop1. So, I needed to parse the URL and pull out foo-p1...foo-pn, n times. I combined the REPLACE, SUBSTR and INSTR functions to do my work. The resulting column in the select statement turned out to be something like this:

replace( substr(portal_table.portal_url, 9, instr(portal_table.portal_url, '.') -9), '-') AS schema_designation

I was then able to modify the output to include || schema_designation || '.' || right before each table name is specified in the output. For example:

dbms_output('update ' || schema_designation ||'.foo set bar = 1;');

so the resulting statement becomes

update foop1.foo set bar = 1;

which can be executed from any schema on the same SID or Service Name as foop1 as long as the appropriate permissions are in place. This worked out really well and saved me a bunch of terrible data entry or logging into too many schemas to run the scripts.

Stupid but Interesting Rookie Mistake

So, did you know that you can join to the same table twice in a single statement? So, let's say we have a simple employee table called emp that has 3 columns: id, name, manager_id. Manager_id refers to the ID column of the emp record for that employee's manager. Now, lets say we want to run a report that lists each employee and their manager. We would have to join to emp twice, right? It would look something like this:

select employee.name as employee, manager.name as manager
from emp employee
join emp manager on employee.manager_id = manager.id;

If our table has data such as:

ID NAME MANAGER_ID
1 Tom NULL
2 Dick 1
3 Harry 2
4 Peter 2
5 Paul 3
6 Mary 4

The results would look something like:

EMPLOYEE MANAGER
Dick Tom
Harry Dick
Peter Dick
Paul Harry
Mary Peter

So, that is a good simple example of how to join to the same table twice. Interesting huh?

Did you know that you can join to the same table twice in different ways while aliasing the table the same way? Neither did I. Why would you ever want to do that? It just sounds bad.

So, anyway, I was recently tasked with fixing a report. It was joining to the table foo on Bar_id and it should have joined on Blah_id. Also, blah should have joined to bar and not woof. So, the statement looked like

select * from woof
join bar on woof.woof_id = bar.woof_id
join foo on foo.bar_id = bar.bar_id
join blah on woof.woof_id = blah.woof_id;

What it needed to look like was:
select * from woof
join bar on woof.woof_id = bar.woof_id
join blah on woof.bar_id = blah.bar_id;
join foo on foo.blah_id = blah.blah_id;

What I accidentally wrote was:

select * from woof
join bar on woof.woof_id = bar.woof_id
join foo foo on foo.bar_id = bar.bar_id
join blah on woof.bar_id = blah.bar_id;
join foo foo on foo.blah_id = blah.blah_id;

So, notice what I did there? It is pretty easy to spot in this simple example but in the convoluted query I was writing, it was a little harder to spot. So, I kept the bad join and added the good join. The results were subtly wrong. It just started showing twice the records that it should have. When you run for a large time frame, you get a healthy set of data back. It is easy to not notice that there are dupes in there. I mean, the bad join flew in production since the dawn of time, or at least since the report was initially built.

I screwed the pooch on this one. The deploy is happening Sunday so either broken stuff is getting pulled out of the deploy or broken stuff is getting deployed and we have to fix it later. Either way, feelsbadman.jpg.

I now, have to come up with a method to trap for this insanity in a Unit Test. The weird thing is that if a record is doubled, its values would all be legit values so, it passed my current test. The QA people assigned to test this were a bit inexperienced so the bug almost made all the way in unnoticed. The real kicker is that the only reason I found it was because some user was supposed to be testing this in UAT but was testing in production so said it wasn't fixed and pushed back. I was so happy when I found out that she was testing in prod that I started to gloat for a second, but then I noticed the output of the report I ran trying to recreate her results. Oh noes!

The good news is that the fix is super quick. The testing will be a little more involved from both the Unit Test perspective and from the QA perspective.

So, I guess it was a push today? My awesomeness is perfectly equaled by my derpity derp. And that concludes this installment. I just realized I didn't say fuck or cunt very much. I wonder if I'm coming down with something? Laters.