The surprising joy of programming in PL/pgSQL

Building a mandatory access control system for PostgreSQL

2018-11-16

Finding a reason

I’ve always enjoyed working with PostgreSQL, and ever since Row-Level Security (RLS) was introduced, I’ve been waiting for a use case to arrive which would allow me to use it. Over the last two years, in speaking to research projects, I’ve encountered different requirements, all of which can be served by using RLS. Then recently the GDPR was implemented in Europe, which brought a new wave of data management requirements to research projects. Together with the requirements from before I had now collected many interesting problems to solve:

In August this year I finally had the opportunity to implement a system which solves all of these problems. The resulting solution is a set of database functions, tables, and views written in PL/pgSQL, designed to be used with the open source REST API tool called postgrest - a component that automatically turns your PostgreSQL database into a REST API.

Programming in PL/pgSQL

PL/pgSQL is a procedural language that extends SQL with control structures (boolean logic, loops, exceptions, assertions), can be used to create functions and triggers, and allows one to use all the data types, operators and functions of SQL. In addition, operations are executed by the server in blocks without marshalling data betweeen the database client and server, which often gives a performance boost.

If you already know SQL, and another C-like language, then it is easy to learn. It feels quite different from other programming languages though. For one thing, the runtime environment is a database server. This means that your code will always be executing in the context of a database transaction. This alone makes for a different programming experience.

Another big difference from most general purpose programming languages is that the only object that you can construct to group your computations and data structures together is a function. There are no classes, prototypes or loose statements available. At this point it is probably useful to see a small code sample:

create or replace function table_describe(table_name text,
                                          table_description text)
    returns boolean as $$
    declare trusted_table_name text;
    declare trusted_table_description text;
    begin
        assert (select ntk.is_user_defined_table(table_name) = true);
        trusted_table_name := quote_ident(table_name);
        trusted_table_description := quote_literal(table_description);
        execute format('comment on table %I is %s',
            trusted_table_name, trusted_table_description);
        return true;
    end;
$$ language plpgsql;
revoke all privileges on function table_describe(text, text) from public;
grant execute on function table_describe(text, text) to admin_user;

Some brief comments are in order. PL/pgSQL is typed, so all parameters, return values and variables must be declared with types. It provides useful helper functions to sanitise input used in dynamic SQL generation: quote_ident prepares input to be used as a SQL identifier, quote_literal simply escapes any existing quotations making the value safe to use, and format allows string formatting so one can generate SQL dynamically in a safe way.

What is also clear from the example is that all functions are database objects. As such, one can use the SQL privilege system to restrict access to the function. This is extremely useful to lock down an API. This is exactly why it is a surprisingly good experience programming in PL/pgSQL: it gives you seamless access to SQL, the privilege system (roles, grants), and all other PostgreSQL server objects.

It is also suprising in another way: database programming has a bad reputation. This is mostly based on anectodal evidence, but I have often heard people speak of stored procedures with fear and loathing. Perhaps they were just being used in a bad way, or to solve the wrong problems. Anyways, I think database server programming is highly underrated.

A simple architecture

Programming with the database server as the runtime leads to a different architecture than the usual object-relational mapping based web app. In this case the client-server relationship, and the location of the so-called business logic is as follows:

HTTP client -> (webapp) -> (logic (PL/pgSQL), DB)

More commonly it is organised as such:

HTTP client -> (webapp, logic (ORM)) -> (DB)

In my implementation, the logic determining access control is located in the database, implemented as database objects. In the more common ORM-based model, the access control logic would be separate from the database. Having the access control measures implemented inside the database system leads to a very simple webapp which only needs to take care of user authentication, request routing and data transfer. Building APIs in this way is very refreshing, and in this sense programming in PL/pgSQL has been an unexpected joy. If curious, you can read more about the MAC implementation on github.