Thursday, February 6, 2014

Yo Dawg, I heard you like writing SQL & Goddamnit, that didn't go as planned.

Today was a roller coaster of a day. I had an interesting success and a stupid yet interesting rookie mistake. It started off my needing to make a change to someone else's script that I have to run during the next deploy.

Interesting Success

As a little background here, we are deploying changes to one of our multitenant applications that relies on one of our old silo'd applications. So we would be generating a few hundred update statements from one database and then applying them to several different databases. So, I would have to either gain access to each of these individual databases and parse out the output of the script and run only the applicable statements OR manually edit the results to add the appropriate schema designation to the resulting scripts, which is tedious and prone to error. Or, I could figure out a way to fix the statement at the time of generation.

 Most of my SQL Writing SQL efforts rely upon the data dictionary and its massive library of tables and views to make magic happen. The script that I was given needed to be run in a somewhat alien application but have the resulting SQL ran in several instances of one of the applications I am intimately familiar with (assuming you consider dry anal rape being intimately familiar).

The problem was that I was expecting to be able to pull the OWNER from the data dictionary ALL_TABLES view or some such thing but the script that I was given was just using Application data tables and was not relying on the data dictionary. This threw a wrench into my plans and the heat was getting turned up also because the deploy is this Sunday at the un-FSM-ly hour of 5 AM - did I mention the dry anal rape, yeah. So, in a panic I started investigating and gaining familiarity with the quasi-alien database. Running some select statements and looking very closely at the ones that were being used in the script I was given. The gentleman that provided the script was nice enough to add in a comment to break up the statements into groups based on their location such as --Portal1. However, I need the schema designation for Portal1 not the words Portal1. So, during my investigation I found that the table that holds the Portal Name Portal1, etc. also holds a URL for the API. Sure enough, the schema designation was mostly included in the URL.

For example: Portal1's URL would be http://foo-p1.xyz.com/blah/blah/woof/woof... and the schema designation would be foop1. So, I needed to parse the URL and pull out foo-p1...foo-pn, n times. I combined the REPLACE, SUBSTR and INSTR functions to do my work. The resulting column in the select statement turned out to be something like this:

replace( substr(portal_table.portal_url, 9, instr(portal_table.portal_url, '.') -9), '-') AS schema_designation

I was then able to modify the output to include || schema_designation || '.' || right before each table name is specified in the output. For example:

dbms_output('update ' || schema_designation ||'.foo set bar = 1;');

so the resulting statement becomes

update foop1.foo set bar = 1;

which can be executed from any schema on the same SID or Service Name as foop1 as long as the appropriate permissions are in place. This worked out really well and saved me a bunch of terrible data entry or logging into too many schemas to run the scripts.

Stupid but Interesting Rookie Mistake

So, did you know that you can join to the same table twice in a single statement? So, let's say we have a simple employee table called emp that has 3 columns: id, name, manager_id. Manager_id refers to the ID column of the emp record for that employee's manager. Now, lets say we want to run a report that lists each employee and their manager. We would have to join to emp twice, right? It would look something like this:

select employee.name as employee, manager.name as manager
from emp employee
join emp manager on employee.manager_id = manager.id;

If our table has data such as:

ID   NAME   MANAGER_ID
1     Tom       NULL
2     Dick       1
3     Harry      2
4     Peter       2
5     Paul        3
6     Mary      4

The results would look something like:

EMPLOYEE    MANAGER
Dick                  Tom
Harry                 Dick
Peter                  Dick
Paul                    Harry
Mary                  Peter

So, that is a good simple example of how to join to the same table twice. Interesting huh?

Did you know that you can join to the same table twice in different ways while aliasing the table the same way? Neither did I. Why would you ever want to do that? It just sounds bad.

So, anyway, I was recently tasked with fixing a report. It was joining to the table foo on Bar_id and it should have joined on Blah_id. Also, blah should have joined to bar and not woof. So, the statement looked like

select * from woof
join bar on woof.woof_id = bar.woof_id
join foo on foo.bar_id = bar.bar_id
join blah on woof.woof_id = blah.woof_id;

What it needed to look like was:
select * from woof
join bar on woof.woof_id = bar.woof_id
join blah on woof.bar_id = blah.bar_id;
join foo on foo.blah_id = blah.blah_id;

What I accidentally wrote was:

select * from woof
join bar on woof.woof_id = bar.woof_id
join foo foo on foo.bar_id = bar.bar_id
join blah on woof.bar_id = blah.bar_id;
join foo foo on foo.blah_id = blah.blah_id;

So, notice what I did there? It is pretty easy to spot in this simple example but in the convoluted query I was writing, it was a little harder to spot. So, I kept the bad join and added the good join. The results were subtly wrong. It just started showing twice the records that it should have. When you run for a large time frame, you get a healthy set of data back. It is easy to not notice that there are dupes in there. I mean, the bad join flew in production since the dawn of time, or at least since the report was initially built.

I screwed the pooch on this one. The deploy is happening Sunday so either broken stuff is getting pulled out of the deploy or broken stuff is getting deployed and we have to fix it later. Either way, feelsbadman.jpg.

I now, have to come up with a method to trap for this insanity in a Unit Test. The weird thing is that if a record is doubled, its values would all be legit values so, it passed my current test. The QA people assigned to test this were a bit inexperienced so the bug almost made all the way in unnoticed. The real kicker is that the only reason I found it was because some user was supposed to be testing this in UAT but was testing in production so said it wasn't fixed and pushed back. I was so happy when I found out that she was testing in prod that I started to gloat for a second, but then I noticed the output of the report I ran trying to recreate her results. Oh noes!

The good news is that the fix is super quick. The testing will be a little more involved from both the Unit Test perspective and from the QA perspective.

So, I guess it was a push today? My awesomeness is perfectly equaled by my derpity derp. And that concludes this installment. I just realized I didn't say fuck or cunt very much. I wonder if I'm coming down with something? Laters.