Saturday, July 4, 2009

Primary Keys Must Be Independent

A Cautionary Tale About Primary Keys:

Once upon a time . . . there was a developer named "Bryan", who was assigned the task of creating a relational database application that tracked the relationships between buildings and the people assigned to them. It was an easy enough application, with only two tables-- People and Sites. When creating the Sites table, "Bryan" thought it would be okay to use the payroll code for each site as his primary key. After all, each site had its own unique payroll code. Why not just repurpose that code for the database key as well?

Several months later, one of the sites decided to open a satellite office. Since it was a different physical location, it needed its own record in the Sites table-- but since it was a satellite office, they used the same payroll code to pay their staff as the parent office. When "Bryan" tried to add the new office to his database table, he received an error message because his primary key wasn't unique. But, he couldn't just modify the entry by appending a suffix either, because that would mess up queries and reports in his application.

The moral of the story is:

Make sure your primary key(s) are not entirely dependent on a single piece of data with which other human beings can screw. If you absolutely must use something like payroll code in the example above, make it a compound primary key that combines the payroll code field with a unique, independent field.