Sunday, October 7, 2018

Safe Operations For High Volume PostgreSQL

Add a new column (safe)

In general, adding a new column is a safe operation that doesn't lock the table. However, there are cases where certain options of the command lock the table. These cases are quite common when working on an evolving product.

Add a new column with a non-nullable constraint / default value (unsafe)

This requires each row of the table to be updated so the default column value can be stored. The entire table is locked during this operation, depending on the size of table, it can take a while and everything else comes to a halt. The solution is to add the new nullable column without a default value, backfill the desired value in batches, then add the non-nullable constraint / default value with an alter statement.

Add a default value to an existing column. If the column is nullable previously, the default value won't alter those existing null rows. If the column is non-nullable, all rows are guaranteed to store some value. Either way, the operation does not block table and safe to execute.

Change the type of a column. Strictly speaking, this operation locks the table and therefore unsafe. If the underlining datatype is not changed, like increasing the length of a varchar, then the table isn't locked. But if the change requires a rewrite/re-cast, each row of the table is to be updated and the table is locked for the same reason as adding a new column with default value. The solution is to add a new column, backfill the new column, and get rid of the old one.

One big good news is that from PostgreSQL 11, adding a new column with default value will be painless, as it should have been :)

Add a Foreign Key (unsafe)

To protect data integrity, an AccessExclusiveLock is placed on both tables, again grinds every read and write to a halt (PostgreSQL 9.6+ reportedly allows read). The solution is to take advantage of Postgres' ability to introduce invalid constraint. The procedure is to first creating an invalid Foreign Key constraint by specifying NOT VALID in the ALTER TABLE statement, and then validate it in a separate VALIDATE CONSTRAINT statement. The second validation only requires a RowShareLock that doesn't block reads nor writes. Do note that if there is reference to non-existing rows, the operation won't be completed and you have to take care of integrity on your own.

Add an index (unsafe)

By nature, the table being indexed is locked against writes and the entire index is built in a single scan of the table. Read transactions can still be done meanwhile.

For production environment, it is always better to build the index without locking writes. CREATE INDEX comes with CONCURRENTLY option for this purpose. Though, building index is still an extra work for the database and this is reflected on extra CPU and I/O load. This might still slow other operations, we notice an increased queue depth when we add index concurrently on one of the biggest tables in the system. Because our infrastructure is in the cloud, with a minimal budget, we can upsize the database a couple of size larger than normal. The extra power makes adding indices (still with CONCURRENTLY option) lot more comfortable and productive.

Add a column with unique constraint (unsafe)

This operation will lock the table because it requires a scan for uniqueness. The solution is to add the column, add unique index concurrently, and then add the constraint onto the table (ADD CONSTRAINT UNIQUE USING INDEX).

Drop a column / a constraint (safe)

This is safe and quick. If the operation appears to take time, the cause is more likely because the table/index was in use rather than the operation time itself. A quick check on pg_stat_activity should confirm that.

Rename an entity (safe)

The entity can be a table (or an index, sequence, or view) or a column in a table. This operation has no effect on stored data and therefore also isn't affected by the size of data.