Development: Database Creation Dos and Don'ts
After you've been a software programmer for a certain number of years, you often take the little things for granted. You get so used to working within a set pattern that you don't realize how dependent you've become on that pattern until something (or someone) comes along and makes your day-to-day programming tasks just a little harder than they need to be.
Database programming is one of these areas that often gets muddled. Unless you work for a large scale company, chances are there aren't any people on your staff whose sole purpose is to interact with the database on a programming level. You may have people that run reports or create minimalistic stored procedures, but oftentimes the hardcore programming falls on your everyday computer programmer.
There is a problem with this. Many times your average software programmer has minimal understanding of database structure and optimization. Most of what he/she knows when it comes to database programming has come out of necessity and not out of full time study and work. This often leaves the idea of structure and convention opent to interpretation - which is fine if that programmer is remaining with you for the duration of his career; but it can often leave you in a troubled position should that person move on to another job.
What often arises in situations like these is that databases are created and maintained by programmers who do the necessary work to get the database out of the way in order to start back on their programming tasks. In small or medium sized companies, this is often desired since budgets are usually tight and time is valuable. Unfortunately, if the programmer doesn't spent enough time planning out the database, you can end up having to struggle with changes and migrations later down the rode.
As an example, Overmortal recently inherited a legacy database from a client and found itself wrestling with many omissions while trying to upgrade a lot of the client's administration functionality. Tables meant to be join tables had additional information stuffed in them; date columns were missing on important tables that could have benefited from date sorting; active bits weren't being used properly; and some tables were even missing ID columns (the previous programmer thought it adequate to use a VARCHAR column as a primary key). All of these small items racked up to many, many hours of database and code changes, as well as some data migration in order to successfully provide the client with the functionality that they needed. Many hours were lost that could have been better spent on other items. This is especially frustrating when you realize that it would have taken a little over a couple hours to plan out the database more properly and add these fields into them.
As a result of the many times we've run into issues like those above, we follow a few standard conventions when putting together a database for the first time.
Always have an ID column. Most databases provide you with the option of creating an INT column in the database that autoincrements. This means that it will automatically increase the value of an INT column each time a new record is stored. Use this functionality to create your primary key. Never set a VARCHAR or anything else as a table's primary key.
Use a GUID column on your important objects. A GUID (or UUID in some scripting languages) is a globally unique identifier of 4 sets of 4 alphanumeric characters (separated by dashes). More often than not, we'll make sure the most important tables have a GUID column. For example, if we were creating a social network for writers, the user table, the book table and the unpublished works table would definitely have GUID's in them. Why? To prevent URL and cookie hacking. Many more developers than should will take the ID of an important table and cookie the user with that ID as a value. For example, some people will user the user's database ID as the value of an authentication cookie. This can easily be hacked and modified giving access to places that user shouldn't have access to. Using a GUID as the value of the cookie in this instance make it's very, very difficult to hack. We don't use GUID's in all tables as many do not have this user interaction, but the important tables will always get this column.
Don't forget the active bit. All non-join tables should have an active bit (boolean) in them (we use the naming convention is_active for this). Most robust applications will have a need to deactivate certain database entries rather than delete them. In addition to having the bit, don't forget to use it. Recently we were working on a legacy database that had several different boolean columns that were never being used, despite the fact that using them would have made things a lot easier for the programmers.
Use date columns! Two very important columns that should usually always be used are the creation date and modification date of the record. We use the naming convention date_created and date_modified. When dealing with database records, inserts and updates modify the database. It's important to have a handle on when certain transactions take place, and even if you never use these fields, it's better to have them and not need them than to suddenly need them and not have them. They also make sorting records a lot easier too!
These are just a few small things you can do while planning out a database that will help programming tremendously in the future. There are plenty of other database optimizations that one can make, but these conventions will help speed along the coding process of any application and prevent unnecessary problems from popping up in the future.
