Earlier today I was perusing Stack Overflow, when I ran across a question about the benefits of declaring a MySQL AUTO_INCREMENT column as nullable:
In a MySQL table, you can obviously set an INT as both NOT NULL and AUTO_INCREMENT, but while they both work together, is the former really necessary when you have the latter?
That aside, is there any performance, memory, storage, or speed bonus for dropping NOT NULL?
It turns out that the database column in question was also a primary key, which is automatically defined as NOT NULL by MySQL, making the question moot.
But the underlying question is interesting: When is it appropriate to remove the NOT NULL constraint from a MySQL column?
My basic take is this:
As a matter of good design, if a column requires a value in order to properly function, it should be declared NOT NULL.

Elegance > Efficiency
The idea of a null MySQL column is appealing on a number of levels: It saves space, and it’s expedient to use null values as a proxy for error conditions and / or to get two bites of the error apple.
Take, as example, this block of code, lifted and slightly modified from the PHP documentation:
$query = "SELECT Name, CountryCode FROM City ORDER by ID DESC LIMIT 150,5"; if ($stmt = $mysqli->prepare($query)) { $stmt->execute(); $stmt->bind_result($name, $code); while ($stmt->fetch()) { //don't print if CountryCode is null if(!is_null($code)) { printf ("%s (%s)\n", $name, $code); } } $stmt->close(); }
By expecting null values for CountryCode to possibly come back from the database query, I’m basically cheating two problems:
I’m enforcing data integrity at the presentation layer, rather than the data model. In other words, I’m being sloppy about ensuring my data is good, and fixing that sloppiness when I output the data.
Unfortunately, sweeping things under the carpet is not a programming pattern. At least, not a good one.
I’m not creating a data model that is resilient / amenable to changes. A program’s design should be minimal, and it seems, at first blush, that passing along null values for data that may not be available is a minimalist approach.
For example, it’s convenient to use a LEFT JOIN to extract relational data from other tables, and deal with nulls as they come up.
Suppose I have a table that contains a list of people’s names. In another table, I have a list of email addresses, keyed to the ID of each record in the people table.
It is very easy to simply use a LEFT JOIN to unite the tables, and deal with missing email addresses as I process the data:
SELECT p.firstname, p.lastname, m.emailaddress FROM people AS p LEFT JOIN metadata AS m ON m.personid = p.id
But this is sloppy for all kinds of reasons, not the least being that unless I have enforced a UNIQUE on the personid column of the metadata table, I will get the same person record several times if he has more than one email address in the system.
In other words, my reliance on nulls has made my data model weak; to the point where I am going to spend way more time and effort dealing with edge cases than had I simply designed a solution that didn’t require the use of null data in the first place.

Good Minimalism Happens At The Model
So it’s my opinion that the place where you save time and make things efficient is when you are modeling your data, not in skimping on server resources.
I’d much rather spend my time and effort trying to find a way to make my data model as clean and minimal as necessary to get the job done properly. That may well mean breaking a complex object into a lot of different, smaller objects, and using an interface or really robust controller to manufacture a bigger object that doesn’t need nulls to properly operate.
But that said, yes, there are times when a null column makes sense: namely, when the nature of the data object is in and of itself quite small.
Let’s go back to our example of a person database.
If we only ever intend to collect one email address for a person, then it does make sense to put that in as a column of the person table, and to make it nullable. Because either we have an email address for him, or we don’t; and we can properly plan our model for that condition.
Specifically, we would build a class for the person, which includes a method for retrieving him from the database and dealing with a null email address.
Here’s a quick, overly specific example of such a class:
class Person { public $firstname; public $lastname; public $email; public function __construct($lname) { $mysqli = new mysqli("localhost", "my_user", "my_password", "world"); if (mysqli_connect_errno()) { throw new Exception("Connect failed: %s\n", mysqli_connect_error()); } $query = "SELECT firstname, email FROM person WHERE lastname = '$lname'"; if ($result = $mysqli->query($query)) { $row = $result->fetch_assoc(); if(!is_null($row['email'])) { $this->firstname = $row['firstname']; $this->lastname = $lname; $this->email = $row['email']; } else { throw new Exception("null found in email for $lname"); } $result->free(); } $mysqli->close(); } } $person = new Person("Jones");
This example is, of course, wildly curt. We’d need a better way of getting people out of the database than just using a last name and we’d certainly want more flexibility in how we go about constructing this class.
The point is that rather than dealing with nulls on the fly, we’re going to specifically build our solution to never surface a null to any other part of the program’s logic, outside of the data model.
In other words, a Person will never have a null value for a field. Ever. We’ll make sure of it when we create the object. That way, we can make limited use of database nulls, and not have to over-engineer the MySQL database, and the data models that feed it, just to achieve data integrity.
Which, of course, leads me to the underlying point of all of this:
If you find yourself needing null database columns, you probably haven’t designed your program properly.
Again, this isn’t always the case; in limited cases where it’s more work to avoid nulls than to deal with them in the data model, it’s OK to use a null column or two.
But any time you substitute a null column for a proper error handling routine, your design is just plain bad. Go back and try again.

Performance < Process
The question on Stack Overflow asked if there is a performance benefit from declaring columns nullable. That was his real goal; not so much to avoid proper error handling, but to save resources. After all, since his AUTO_INCREMENT column would always have a value, why enforce a NOT NULL constraint?
And sure, that’s true as far as it goes. If the database server doesn’t need to enforce a constraint, or store a chunk of data, then it is smaller and runs faster.
The issue is, does the performance improvement of a nullable column outweigh the practical benefit of enforcing a resilient data model?
In other words, even when we have a very high certainty that a constraint condition will be met, is it worth it to not enforce that constraint?
I say no; even if the performance hit was higher, I’d still opt to keep a column NOT NULL. Because a good program ensures all of its operations perform as designed and exit gracefully when they don’t.
Given how little work it is for a MySQL server to enforce a NOT NULL constraint (it’s pretty much doing that already on every INSERT and UPDATE query for a table with a primary key) and that even when using a null value is acceptable, a good program is going to make such records sparingly, the storage and performance benefits of nullable columns are a Faustian bargain, at best.
All links in this post on delicious: https://delicious.com/dougvdotcom/efficiency-vs-good-design-when-null-is-not-a-wise-option