Type-safe JSON queries for Esqueleto

By Phil Freeman · September 18, 2019

Over two years ago, we migrated from RethinkDB to Postgres, and in a blog post at the time, I said this about our use of JSONB:

We don’t store many JSON documents in the database, but for some cases, it is definitely preferable to storing flattened data over many columns. To support these cases, we write custom SQL functions which are then imported into Haskell as typed functions on esqueleto expressions.

At the time, we loved having the ability to write queries against JSON data stored in Postgres, but it had one major drawback: every operator which queries JSON data can fail if the required data does not exist.

For example, we can select a property from an object when it exists:

# select ('{ "foo": 42, "bar": "test" }' :: jsonb) -> 'foo';

 ?column? 
----------
 42
(1 row)

But we receive NULL (that's a SQL NULL, not a JSON null!) from Postgres if we ask for a field which is not present:

# select ('{ "foo": 42, "bar": "test" }' :: jsonb) -> 'baz';

 ?column? 
----------
 [null]
(1 row)

This isn't surprising, but it means that many JSONB operators have to be given types which include an additional Maybe. For example, in our code base, we had given the -> operator (not to be confused with the ->> operator!) the following type definition:

(->.) :: SqlExpr (Value (Maybe (Jsonb a))) -> SqlExpr (Value Text) -> SqlExpr (Value (Maybe (Jsonb b)))

Jsonb is our own wrapper type for JSONB data in Postgres, although excitingly, esqueleto recently got a JSONB type of its own.

Wouldn't it be nice if we could use the type information we already have to verify that we only use these operators correctly, and remove the unnecessary Maybes? Indeed, we should be able to use information about the type of the first argument to -> to determine the valid inputs in the second argument.

In this short post, I'll show how we can do exactly that with a little help from Template Haskell.

Warming up with record types

We actually use the technique I'm about to describe for both record types and sum types stored in JSON in Postgres, but for simplicity's sake, I'll start with the former. I'll show a preview of the approach for sum types at the end of the article.

In the case of record fields, we want to take a type and generate a family of accessor functions for it which will be compatible with the generated SQL. In order to do this, I define the following newtype to describe safe accessors from a big data structure to a small data structure:

newtype Accessor big small = Accessor { getAccessor :: Text }
  deriving (Show, Eq)

The big and small type arguments here are phantom type arguments - they are not used in the body of the type declaration - but they are essential for making things type-safe.

With this definition, we can refine the type of the -> operator as follows:

(->.) :: SqlExpr (Value (Jsonb a)) -> Accessor a b -> SqlExpr (Value (Jsonb b))

As long as we hide the Accessor data constructor in our API, this definition will now be safe. The only question is, how can we create a valid collection of Accessors for our type? We could write them out by hand based on the type definition, but that would be very error prone. Instead, this is a perfect job for Template Haskell, which can inspect our type definition and generate compatible code from it.

The Haskell wiki says this about Template Haskell:

Template Haskell (TH) is the standard framework for doing type-safe, compile-time meta programming in the Glasgow Haskell Compiler (GHC). It allows writing Haskell meta programs, which are evaluated at compile-time, and which produce Haskell programs as the results of their execution.

These "meta programs" are written in the Q monad, which is provided by the Language.Haskell.TH module in the template-haskell package. The Q monad provides several key pieces of functionality, such as

  • Looking up a declaration by name
  • Creating names for variables and types
  • Creating new types, expressions and declarations

Once we have written a program in the Q monad, we can invoke it using a Template Haskell splice, which will effectively be replaced with the result of executing that program at compile time.

Without further ado, here is the essential piece of Template Haskell code in its entirety:

unsafeMakeAccessors :: Name -> Q [Dec]
unsafeMakeAccessors nm = do
    fields <- TH.reify nm >>= \case
      TH.TyConI (TH.DataD [] _ _ _ [TH.RecC _ fields] _) -> pure fields
      _ -> fail "unsafeMakeAccessors: not a record type"
    concat <$> traverse (\(fieldNm, _, fieldTy) -> toField fieldNm fieldTy) fields
  where
    -- Create an accessor function for a single record field.
    toField :: Name -> TH.Type -> Q [Dec]
    toField fieldNm fieldTy = do
      let accessorNm = TH.mkName ("_" <> TH.nameBase nm <> "_" <> TH.nameBase fieldNm)
      accessorTy <- [t| Accessor $(TH.conT nm) $(pure fieldTy) |]
      accessorVal <- [e| Accessor (pack $(pure . TH.LitE . TH.StringL . TH.nameBase $ fieldNm)) |]
      pure [ TH.SigD accessorNm accessorTy
           , TH.FunD accessorNm
               [ TH.Clause [] (TH.NormalB accessorVal) [] ]
           ]

This function receives the Name of the record type we're interested in, and yields a collection of new declarations to emit in the Q monad.

Its implementation is simple:

  • First, it uses reify to access the definition of the type declaration.
  • If it is not a data declaration, it fails using error, which will cause an error during compilation.
  • Next, it traverses the list of record fields, and generates one Accessor for each, using the toField helper function:
    • For each field, we generate a new Name for our Accessor using mkName
    • We use the [t| ... |] quasiquoter to create the type of our accessor, antiquoting ($(...)) the big and small types (the type of the record and the type of the field) into their correct places
    • We use the [e| ... |] quasiquoter to create the implementation, again using antiquotation to include the field name itself as a string literal.
    • Finally, we pack all of this up into a pair of declarations: the type declaration and the value declaration.

(You might be wondering why the name includes the prefix unsafe. That is because the correct usage of this function requires a compatible pair of ToJSON and FromJSON instances in order that the JSON itself is serialized correctly as a record using the unmodified field names. This precondition has to be verified by the caller, but fortunately, it's easy to meet this condition by deriving those instances using their default implementations.)

To use this function, we can simply pass it the name of a record type:

data MyRecord = MyRecord
  { foo :: Int
  , bar :: String
  } deriving anyclass (ToJSON, FromJSON)
    deriving stock (Show, Generic)
    
unsafeMakeAccessors ''MyRecord

which will derive two new Accessors for us - one for foo and one for bar! If we turn on the --dump-splices compiler option, we can see the generated code:

_MyRecord_foo :: Lumi.Database.Persist.Json.Accessor MyRecord Int
_MyRecord_foo = Lumi.Database.Persist.Json.Accessor (Data.Text.pack "foo")
  
_MyRecord_bar :: Lumi.Database.Persist.Json.Accessor MyRecord String
_MyRecord_bar = Lumi.Database.Persist.Json.Accessor (Data.Text.pack "bar")

A little verbose, but exactly what we'd expect.

What's particularly nice is that GHC will now tell us the list of available accessors if we use a typed hole, thanks to the relatively recent addition of the typed hole fits feature:

Prelude> :{
Prelude| test :: SqlExpr (Value (Jsonb MyRecord)) -> SqlExpr (Value (Jsonb Int))
Prelude| test rec = rec ->. _acc
Prelude| :}

error:
    • Found hole: _acc :: Accessor MyRecord Int
      ...
      Valid hole fits include
        _MyRecord_foo :: Accessor MyRecord Int

More challenging: sum types

The case of sum types is more interesting and more challenging. I won't go into the same level of details this time, but I will show an example and the generated code.

We start with a simple sum type, and use another Template Haskell function, unsafeMakeFold, to create the appropriate esqueleto code:

data MySum
  = Foo Int
  | Bar String
  deriving anyclass (ToJSON, FromJSON)
  deriving stock (Show, Generic)

unsafeMakeFold ''MySum

Note that, once again, we are using the default aeson instances here.

Our Template Haskell splice generates two interesting pieces of code. The first is a record of function types which can be used to express a pattern match against a sum type in esqueleto:

data FoldMySum r
  = FoldMySum 
    { foldFoo :: SqlExpr (Value (Maybe (Jsonb Int)))
              -> SqlExpr (Value r)
    , foldBar :: SqlExpr (Value (Maybe (Jsonb String)))
              -> SqlExpr (Value r)
    }

The second piece of generated code is a function which can be used to consume such a data structure and turn it into a pattern match against a sum type:

foldMySum 
  :: forall r
   . PersistField r
  => FoldMySum r
  -> SqlExpr (Value (Maybe (Jsonb MySum)))
  -> SqlExpr (Value (Maybe r))
foldMySum (FoldMySum f g) x
  = case_
      [ ( (x ->>? val "tag") ==. val (Just "Foo")
        , veryUnsafeCoerceSqlExprValue (f (x ->? val "contents"))
        )
      , ( (x ->>? val "tag") ==. val (Just "Bar")
        , veryUnsafeCoerceSqlExprValue (g (x ->? val "contents"))
        )
      ]
      nothing

As you can see, we compile a pattern match down to a CASE statement in SQL, and we use veryUnsafeCoerceSqlExprValue under the hood, once we have determined the tag in the JSON representation of the value.

There is more we could do here, such as generating SQL code corresponding to the data constructor functions themselves, but for our purposes, this is enough right now.

Lumi is accelerating the world's transition to responsible manufacturing. Want to work with us? We're hiring.

RSS · Github