DBDeveloper: December 2012

How many of you are familiar with K.I.S.S? Lick it up? God of Thunder? Cold gin? Any of these sound familiar? If so, I applaud your good taste in bad music. But, I am talking about the Keep It Simple Stupid mantra many of those in IT have forgotten or eschew in favor of overly complex systems that they can gloat about designing and generally fuck up in one way or another. I am reminded of a guy I used to work with that loved to design and implement overly complex systems because he sounded smart. His white board designs invariably resembled His Holiness The Flying Spaghetti Monster. This is the guy that didn't trust the native windows management stuff so wrote his own which resulted in an application that docked in the upper left corner and could not be moved. Yeah, that is way better than a regular Windows window dude! Not.

The new systems we write and the old systems we maintain and improve grow in complexity naturally. Some of them are needfully complex from day one, but some can start quite simple and grow complex over time as features are added and new systems come up that need to be utilized. For the love of everything holy, do not artificially create complexity. Make stuff as stupid simple as you can from the beginning. In all that you do, this will pay dividends in the future.

I was tasked recently with comparing databases for an application we have with a couple dozen separate instances. Ideally, each of these instances would be identical in structure (or nearly so) and differ on data mainly (Customer 1 in Database A is not the same as Customer 1 in Database B). Unfortunately, we have only recently gotten into structured deployments using Hudson, not to mention a rogue DB Developer we had on staff back in the day (holla!). Over the years, there has been divergence. Some of them are easy to identify as useless or no longer useful items that we can clean up, but some, oh, some are hard to identify as such. In mature systems in mature organizations, the keys to the databases are heavily guarded and formal processes are in place to keep most people out. Some start ups even into their adolescent phase have thrown the doors wide open. How this starts is small businesses or business units that have either no formal DBA or only a couple of coders and a DBA that is wearing a few different hats. In strange cases you may have a boss that is making decisions that are poorly thought out. Whichever happens, what you end up with is objects that you have no idea who created, why they were created, if they are still in use, or what other objects might depend on those. What you end up with is a crazy Jenga game where the wrong table drop can mean long night spent bailing the system out, grumbles and finger pointing from your peers and maybe even the loss of gainful employment. Tread carefully.

This little statement can be a lifesaver in this situation:

SELECT name, line, text
FROM dba_source
WHERE upper(text) like upper('%<Object name to search>%');

Replace <Object name to search> with the table or procedure name you want to check. If you do this and get a table or view does not exist error, you need to get your DBA involved to grant you the appropriate data dictionary privileges. This statement will search stored procedures, functions, packages and the like for a reference to the searched text. If the table is one-off for a specific reason, probably due diligence was not used during its creation so the dependencies tab in a DB browsing tool or the dependency data dictionary table would not have any information to assist you in making a decision regarding the object's worth. You may be able to deduce after some inspection whether it is a useful and still actively used object. But you may not, and individual inspection of items doesn't scale. What if there are 50 objects that do not match? Individual inspection would be a month long task if not longer.

Now, consider table constraints for a moment. Let's say when we built our recipe table from earlier posts we made sure the recipe id could never be null. One way, and a very popular way of doing this is directly in the table definition:
Create Table RECIPE (
RECIPE_ID NUMBER NOT NULL
.
.
.
);

Now, let's say you want to monetize your recipe application. Every new subscriber gets their own database. A year down the road you have 50 subscribers so you have 50 database schemas with roughly identical copies of the recipe database structure. Now, you hire a neighborhood kid to do some light tech work that is beneath you as the King of the new Recipe Database Empire FaceCookBook or CookFaceBook or CookBook? You are too busy with discussions of your impending IPO and talks with Bono and what not to compare your databases. Your new hire compares everything, including constraints. What would you say if I told you that your 50 databases have 50 different constraints for each original constraint? Surprised? When you think about it, it makes sense but I never took the time to think about it until it was too late. My comparison of 27 databases have 120,000+ collisions on constraints. Every table that had an in line NOT NULL declaration on a column or a DEFAULT value declaration had a different name in each database. Oracle has an automatic sequence that is behind the scenes and when you declare a column to be NOT NULL or DEFAULT 0 or whatever, the way Oracle implements and enforces that is by creating a system named constraint in the form of SYS_C00000000001 or something like that where the 1 is the value from the system constraint sequence. Since you are letting each individual system name the constraints whatever it wants, there is no guarantee that SCHEMA1.SYS_C0000000001 is identical to SCHEMA2.SYS_C0000000001. Basically, all hell breaks loose. You get these false positives or probable false positives that muddy the waters so you cannot see immediately that TABLE C in SCHEMA 48 is missing a constraint on COLUMN X.

Let's consider an alternative. Try this instead:

CREATE TABLE recipe
(
recipe_id NUMBER CONSTRAINT recipe_recipe_id_nn NOT NULL
.
.
.
);

Now, the naming convention lacks something but IMHO it is far superior to SYS_C0001869645. If you ever tried to insert a recipe with a recipe id, instead of an error message that says SYS_C000689345 violated you would see RECIPE_RECIPE_ID_NN violated. You immediately know that the recipe id column in the recipe table should be not null. And, when the neighborhood kid does the compare for you, he doesn't interrupt your IPO meeting with Jon Bon Jovi and Curt Schilling to tell you that there are 120,000+ discrepancies between databases and WTF is going on!?!!?! It could get ugly.

For more on constraints, check out docs.oracle.com, www.oraclefaq.com, http://jonathanlewis.wordpress.com (I like this guy), http://asktom.oracle.com, or Burleson at http://www.dba-oracle.com/. If you find a page with Cowboy Burleson on it you have to take a shot but if you find a page with Uncle Touchy Pedo Burleson on it you have to put yourself on a list and seek therapy - extra points if you take three shots, do the tuck and exclaim, "It puts the lotion on its skin, or it gets the hose again!"

Oracle has a function called decode that allows you to decode coded values or even encode values depending on your needs. The function takes a column, a series of if-then pairs and a default value. An example is in order. Let's think back to our Recipe database example from earlier posts. You have a Table called Recipe and it has a column called Type to show if the recipe is an Appetizer or a Soup or a Dessert for example. And let's say that in Version 1 of our recipe database we just made RECIPE.TYPE a CHAR(1) column. So, we store 'A' for Appetizer, 'S' for Soup and 'D' for Dessert, etc. Now, let's say your Mom/SO/Daughter/Client is confused by the 'S' on the screen and why Salads show an 'L' (for obvious reasons we cannot have both Soup and Salad designated with an S unless we want to combine them so the 'S' designates a Soup or a Salad. For this example, we want to keep them separate, and 'A' was taken for Appetizer so the next available letter in Salad is the 'L', but you don't want to explain every one of them. You can write a SQL statement with a Decode and list out all of them one time and use that where you need it. Our statement would look something like:

SELECT NAME, DECODE(TYPE, 'A', 'Appetizer', 'S', 'Soup', 'D', 'Dessert', 'L', 'Salad', 'Other') AS Decoded_Type FROM RECIPE;

This is pretty handy. Now, let's say you want to run a report and see how many different type recipes you have. One possible query, using multiple decodes within an aggregate would be:

SELECT
SUM(DECODE(TYPE, 'A', 1, 0)) as NumberOfAppetizers,

SUM(DECODE(TYPE, 'S', 1, 0)) as NumberOfSoups,

SUM(DECODE(TYPE, 'D', 1, 0)) as NumberOfDesserts,

SUM(DECODE(TYPE, 'L', 1, 0)) as NumberOfSalads,

SUM(DECODE(TYPE, 'A', 0, 'S', 0, 'D', 0, 'L', 0, 1)) as NumberOfOthers
FROM RECIPE;

Now, the way SUM(DECODE(TYPE, 'A', 1, 0)) as NumberOfAppetizers, reads is: for every record, I want to select 1 if the TYPE = 'A' and 0 Otherwise, Oh, and, please sum up the total of that for me and call it NumberOfAppetizers. This might not be the best way to do this for our example, but I ran across a real world case where someone had written a huge SQL statement where the calculated those values in sub-selects, and then joined all of those sub-selects together to display the desired information. Although the above is a much simplified version it could be expanded upon to utilize DECODE, alleviate the multiple sub-select join, and improve readability and performance.

A couple of other things we hit upon this time that might not have seen before is the nested function calls and the column aliasing. Functions, whether SQL Native or After-Market, depending on what they do and your use-case or needs, can be nested where the inner function executes and returns result to the outer function which then executes. We see that with the decode statements and the sum functions. The decode converts the type to either a 1 or 0 for each row and then the results are summed for the entire table. Pretty neat stuff. I have used this a lot with dates - ADD_MONTHS(TRUNC(SYSDATE), 1) or something like that. The column aliasing lets you rename columns in a select statement. SELECT TYPE AS MY_TYPE, SUM(AMOUNT) AS PROFIT FROM RECIPE GROUP BY TYPE; etc.

I feel this might be a slippery slope here because my explanations keep introducing new things. ADD_MONTHS is a function that takes a date and returns the passed value moved n months into the future or past depending on sign. Yes, ADD_MONTHS('01-DEC-2012', -12) will return '01-DEC-2011'. Sysdate, I believe we have covered before but just in case we haven't that is the system date from the oracle server. TRUNC is a function that truncates part of the date value that you pass depending on what you pass. TRUNC(SYSDATE) drops the time portion and returns the current day month and year. Other options exist, like TRUNC(SYSDATE, 'MM') returns the first day of the current month, etc. Another new item is the GROUP BY statement. If you notice the absence of GROUP BY in the earlier example, that is because I did not select anything that was not being aggregated. That is the rule with GROUP BY in Oracle. You must group by everything that you are not aggregating. If you are aggregating everything you do not need the GROUP BY. So, in the PROFIT example, I am selecting TYPE and the SUM of AMOUNT so I must GROUP BY TYPE or Oracle tells me TYPE is not a group by expression.

Check out the Oracle Function Docs for more information.

I'll dig into more interesting Oracle SQL statements, problems and/or solutions in the future. And, I might even lose my shit and go off on a rant. Toodles.

DBDeveloper

Friday, December 14, 2012

Use Good Naming Conventions When Creating Oracle Objects

Friday, December 7, 2012

The Decode function in Oracle