Some Different Types of Primary Keys You Can Use for Your Database
I was recently given a programming assignment where I had to create a API server that would be connected to a database. That’s easy enough. But one of the deliverables was that the primary key be a UUID instead of the standard integer Primary Key we’re all used to. It was honestly my first time hearing about them. When I was learning MySQL, I remember a passage from a book saying something along the lines of
When you design your database, you will find that sometimes you can let the database auto-increment a number for you as a key, but sometimes, you will want to use one of your fields. If you can decide that it is unique enough, this will provide a built in constraint you can take advantage of.
When I was taught how to use Rails, I remember asking this question about using something else as a primary key, and ultimately I was shut down. It really got me thinking. There are a lot of ways you can do primary keys in a database, but what is the benefit and cost of each method. So below, I’m going to list a few that I know of, and talk about a major benefit and cost of each. Starting with….
Auto-Incrementing Integers
Here is our bread-and-butter way of doing primary keys. Chances are, your table is currently using this right now. These are a basic field that is added to your table, where with every record is given an integer that is the previous records +1. Nice and simple. If you have an API, you can just send this information out, it will probably be small and just looking at it, you have an idea where the record is in relation to the others
CONs
- Just by looking at it, you can tell a lot about the table. And I mean A LOT. By just seeing a id of 64, you can automatically infer that there are at least 63 records before you. If you’re brave, you can venture that 65 and up, are coming soon too. A lot of tokens, urls, and general API calls have us giving the id of something over as a response. We can try to sign the way we send it over, and try to encrypt it beforehand in some kind of way, but that is extra work added to protect yourself. If you don’t do it correctly, it really just takes someone guessing to be able to do something you may not have intended them to do.
- Depending on the situation, it may look bad for you. Let’s say you have a store, or some kind of business. And let’s say you need to send an instance of an item for sale, or a record of a client based on your database. They look at the object and see id:24, or see id:2 . Though they may not know how many records, you have beyond that, seeing a low number suggests you don’t have that many clients, or you don’t have that many items for sale. In a previous article, I spoke about using :slugs instead of id for your routes, so this can help alleviate a few things. But you now have more information that can be inferred about your and your system, that may not be good in a regular light
PROs
- Speed. Cheap Speed. I will speak more about it in the section about using UUIDs, but the quick version is, your database will perform a bit faster by using primary integers compared to using some kind of long string or other datatypes. Here is an article by Yves Trudeau speaking about the difference in performance. Basically, it takes up more storage and more processing power to store and sort through records with integer primary keys than other characters.
- It becomes easier to make and manage records. On a very basic level, you never have to worry about your ability to create a new record. If you are using an ORM, you can set up certain validations to make sure certain fields are present, or that they are unique, or provide a few other benefits that the other primary key types do. It will take more work, but it is possible. Though this can be a double-edged sword, you basically don’t have to worry about your record conflicting with another if you do not want to. Unless you programmed something to stop you, if you record it, it will make a new one.
UUID (Universally Unique IDentifier)
This is the new type of primary key that I recently learned about and its pretty fun. Per IETF.org,
A UUID is an identifier that is unique across both space and time,
with respect to the space of all UUIDs. Since a UUID is a fixed
size and contains a time field, it is possible for values to
rollover (around A.D. 3400, depending on the specific algorithm
used).
Wow, unique across time and space(until 3400 AD. That is intense. Basically, they are a 128 bit long string that is usually store as a 32-character hexadecimal digits. Think of something like “223e4567-c89b-32d3-a456–426611174000” . Compared to an auto-incrementing integer, the next record generated isn’t necessarily “223e4567-c89b-32d3-a456–426611174001” . There are different algorithms to generate them, and truthfully, they look intense.
CONs
- It isn’t so easy to set it up. Chances are, if you’ve never heard of UUIDs before, you wouldn’t have any idea how to set them up. A lot of modern databased have ways to generate them on their own, but unless you are comfortable with them, you’re going to have to do some research to find out how. Compared to the other primary key methods, the process just to generate them takes more effort than a “natural” primary key, or an integer the database just does for you.
- They can hurt performance in the long run. Just by design, they are typically store as 32-character hexadecimal (usually as a string). Comparing that to a id:2, that’s a big different. According to the article by Yves Trudeau, when the database searches and compares records based on UUID, they go character by character. Doing these comparisons, end up taking a lot of memory and time, and will get worse with a bigger table. Here are a few more links going further in-depth about them.
https://news.ycombinator.com/item?id=14523523
https://medium.com/@Mareks_082/auto-increment-keys-vs-uuid-a74d81f7476a
PROs
- Where the auto-incrementing integer gave extra information about your tables that you may not have wanted to share, UUIDs do the opposite. No one can look at a something like this “72e4122–0625–11ea-9f44–8c16456798f1” and infer anything from it, other than that you know UUIDs. A person with bad intentions, can not just change the UUID, and trust that they now have access to another record that you have. Like I mentioned before, “… unique across both space and time...” Though they are necessarily friendly to the human eyes, you don’t have to go through the effort of using slugs if you use this in routes. You can have a fresh new company, and nobody would be any the wiser.
- By their design, they are “ unique across both space and time,
with respect to the space of all UUIDs.” Which means they should be unique across databases. If you ever have to merge tables or databases or information. You now have one less thing to worry about. If you compare this to auto-incrementing integers as primary keys, It is highly possible that 2 different tables have a record with the id of 1. In a current algorithm used to generate UUIDs, the UUID is generated using a combination of time and the MAC address. MAC addresses are meant to be unique in themselves. So combining them with time, means that the chances that there will be a conflict in your primary keys of 2 different sources are close to impossible.
Natural Keys
These are basic keys that come directly from the data you are storing. They are an attribute of the data you are storing that is naturally unique, and make a good primary key. A current example of this (though please don’t do this on any public data you give out) would be using person’s social security number. By default, they are unique, already collected, and easy to compare.
CONs
- They are incredibly limiting. Though its nice to save an extra field and not have to bother much with them, there aren’t that many naturally unique sources of information you can use for a field. You may end up thinking you have a unique source, but if you plan for the long run, you may find that isn’t the case. So you may end up changing up your database.
- You now have to be very, very mindful of the data you are putting in, making sure that they are unique. After you reach a certain limit, you may decide to expand your primary key from 1 field to multiple fields. Usually with the database, you can make a composite key (a primary key made up of multiple fields), but you are now using more power to go through this. This won’t hurt much in a small table, but the bigger it gets the slower it becomes.
PROs
- In the short run, you save a lot of space. You don’t have to worry about coming up with the data, because your data will keep itself in check. No extra column, or coming up with some way to generate a key, because it’s right there.
- The keys have meaning to anyone seeing them. An auto-incrementing integer can tell you about how many records you have, or the order they were done, but you can’t infer any information about the actual record. If you are trying to store information on countries and use their name as a key, you can immediately infer what kind of data the record has. A query you create will have more meaning to you than searching for id: 1. And will have even more meaning than searching for id:”0020a216–0626–11ea-9f44–8c16456798f1"
Going through these has shown me that there needs to be a lot of thought that goes into how you set up your database. All of these so far, have a really good benefit to them, but always at the cost of something else. It’s fun to come up with that decision and see what can come of it. (PS. I’m currently leaning towards Integers. I’ve been primarily doing APIs and there are things you can do to hide it, and the performance is great.)