Over two years ago, we migrated from RethinkDB to Postgres, and in a blog post at the time, I said this about our use of JSONB:
We don’t store many JSON documents in the database, but for some cases, it is definitely preferable to storing flattened data over many columns. To support these cases, we write custom SQL functions which are then imported into Haskell as typed functions on esqueleto expressions.
At the time, we loved having the ability to write queries against JSON data stored in Postgres, but it had one major drawback: every operator which queries JSON data can fail if the required data does not exist.
For example, we can select a property from an object when it exists:
# select ('{ "foo": 42, "bar": "test" }' :: jsonb) -> 'foo';
?column?
----------
42
(1 row)
But we receive NULL
(that's a SQL NULL
, not a JSON null
!) from Postgres if we ask for a field which is not present:
# select ('{ "foo": 42, "bar": "test" }' :: jsonb) -> 'baz';
?column?
----------
[null]
(1 row)
This isn't surprising, but it means that many JSONB operators have to be given types which include an additional Maybe
. For example, in our code base, we had given the ->
operator (not to be confused with the ->>
operator!) the following type definition:
(->.) :: SqlExpr (Value (Maybe (Jsonb a))) -> SqlExpr (Value Text) -> SqlExpr (Value (Maybe (Jsonb b)))
Jsonb
is our own wrapper type for JSONB data in Postgres, although excitingly, esqueleto
recently got a JSONB type of its own.
Wouldn't it be nice if we could use the type information we already have to verify that we only use these operators correctly, and remove the unnecessary Maybe
s? Indeed, we should be able to use information about the type of the first argument to ->
to determine the valid inputs in the second argument.
In this short post, I'll show how we can do exactly that with a little help from Template Haskell.
Warming up with record types
We actually use the technique I'm about to describe for both record types and sum types stored in JSON in Postgres, but for simplicity's sake, I'll start with the former. I'll show a preview of the approach for sum types at the end of the article.
In the case of record fields, we want to take a type and generate a family of accessor functions for it which will be compatible with the generated SQL. In order to do this, I define the following newtype to describe safe accessors from a big
data structure to a small
data structure:
newtype Accessor big small = Accessor { getAccessor :: Text }
deriving (Show, Eq)
The big
and small
type arguments here are phantom type arguments - they are not used in the body of the type declaration - but they are essential for making things type-safe.
With this definition, we can refine the type of the ->
operator as follows:
(->.) :: SqlExpr (Value (Jsonb a)) -> Accessor a b -> SqlExpr (Value (Jsonb b))
As long as we hide the Accessor
data constructor in our API, this definition will now be safe. The only question is, how can we create a valid collection of Accessor
s for our type? We could write them out by hand based on the type definition, but that would be very error prone. Instead, this is a perfect job for Template Haskell, which can inspect our type definition and generate compatible code from it.
The Haskell wiki says this about Template Haskell:
Template Haskell (TH) is the standard framework for doing type-safe, compile-time meta programming in the Glasgow Haskell Compiler (GHC). It allows writing Haskell meta programs, which are evaluated at compile-time, and which produce Haskell programs as the results of their execution.
These "meta programs" are written in the Q
monad, which is provided by the Language.Haskell.TH
module in the template-haskell
package. The Q
monad provides several key pieces of functionality, such as
- Looking up a declaration by name
- Creating names for variables and types
- Creating new types, expressions and declarations
Once we have written a program in the Q
monad, we can invoke it using a Template Haskell splice, which will effectively be replaced with the result of executing that program at compile time.
Without further ado, here is the essential piece of Template Haskell code in its entirety:
unsafeMakeAccessors :: Name -> Q [Dec]
unsafeMakeAccessors nm = do
fields <- TH.reify nm >>= \case
TH.TyConI (TH.DataD [] _ _ _ [TH.RecC _ fields] _) -> pure fields
_ -> fail "unsafeMakeAccessors: not a record type"
concat <$> traverse (\(fieldNm, _, fieldTy) -> toField fieldNm fieldTy) fields
where
-- Create an accessor function for a single record field.
toField :: Name -> TH.Type -> Q [Dec]
toField fieldNm fieldTy = do
let accessorNm = TH.mkName ("_" <> TH.nameBase nm <> "_" <> TH.nameBase fieldNm)
accessorTy <- [t| Accessor $(TH.conT nm) $(pure fieldTy) |]
accessorVal <- [e| Accessor (pack $(pure . TH.LitE . TH.StringL . TH.nameBase $ fieldNm)) |]
pure [ TH.SigD accessorNm accessorTy
, TH.FunD accessorNm
[ TH.Clause [] (TH.NormalB accessorVal) [] ]
]
This function receives the Name
of the record type we're interested in, and yields a collection of new declarations to emit in the Q
monad.
Its implementation is simple:
- First, it uses
reify
to access the definition of the type declaration. - If it is not a
data
declaration, it fails usingerror
, which will cause an error during compilation. - Next, it
traverse
s the list of record fields, and generates oneAccessor
for each, using thetoField
helper function:- For each field, we generate a new
Name
for ourAccessor
usingmkName
- We use the
[t| ... |]
quasiquoter to create the type of our accessor, antiquoting ($(...)
) thebig
andsmall
types (the type of the record and the type of the field) into their correct places - We use the
[e| ... |]
quasiquoter to create the implementation, again using antiquotation to include the field name itself as a string literal. - Finally, we pack all of this up into a pair of declarations: the type declaration and the value declaration.
- For each field, we generate a new
(You might be wondering why the name includes the prefix unsafe
. That is because the correct usage of this function requires a compatible pair of ToJSON
and FromJSON
instances in order that the JSON itself is serialized correctly as a record using the unmodified field names. This precondition has to be verified by the caller, but fortunately, it's easy to meet this condition by deriving those instances using their default implementations.)
To use this function, we can simply pass it the name of a record type:
data MyRecord = MyRecord
{ foo :: Int
, bar :: String
} deriving anyclass (ToJSON, FromJSON)
deriving stock (Show, Generic)
unsafeMakeAccessors ''MyRecord
which will derive two new Accessor
s for us - one for foo
and one for bar
! If we turn on the --dump-splices
compiler option, we can see the generated code:
_MyRecord_foo :: Lumi.Database.Persist.Json.Accessor MyRecord Int
_MyRecord_foo = Lumi.Database.Persist.Json.Accessor (Data.Text.pack "foo")
_MyRecord_bar :: Lumi.Database.Persist.Json.Accessor MyRecord String
_MyRecord_bar = Lumi.Database.Persist.Json.Accessor (Data.Text.pack "bar")
A little verbose, but exactly what we'd expect.
What's particularly nice is that GHC will now tell us the list of available accessors if we use a typed hole, thanks to the relatively recent addition of the typed hole fits feature:
Prelude> :{
Prelude| test :: SqlExpr (Value (Jsonb MyRecord)) -> SqlExpr (Value (Jsonb Int))
Prelude| test rec = rec ->. _acc
Prelude| :}
error:
• Found hole: _acc :: Accessor MyRecord Int
...
Valid hole fits include
_MyRecord_foo :: Accessor MyRecord Int
More challenging: sum types
The case of sum types is more interesting and more challenging. I won't go into the same level of details this time, but I will show an example and the generated code.
We start with a simple sum type, and use another Template Haskell function, unsafeMakeFold
, to create the appropriate esqueleto
code:
data MySum
= Foo Int
| Bar String
deriving anyclass (ToJSON, FromJSON)
deriving stock (Show, Generic)
unsafeMakeFold ''MySum
Note that, once again, we are using the default aeson
instances here.
Our Template Haskell splice generates two interesting pieces of code. The first is a record of function types which can be used to express a pattern match against a sum type in esqueleto
:
data FoldMySum r
= FoldMySum
{ foldFoo :: SqlExpr (Value (Maybe (Jsonb Int)))
-> SqlExpr (Value r)
, foldBar :: SqlExpr (Value (Maybe (Jsonb String)))
-> SqlExpr (Value r)
}
The second piece of generated code is a function which can be used to consume such a data structure and turn it into a pattern match against a sum type:
foldMySum
:: forall r
. PersistField r
=> FoldMySum r
-> SqlExpr (Value (Maybe (Jsonb MySum)))
-> SqlExpr (Value (Maybe r))
foldMySum (FoldMySum f g) x
= case_
[ ( (x ->>? val "tag") ==. val (Just "Foo")
, veryUnsafeCoerceSqlExprValue (f (x ->? val "contents"))
)
, ( (x ->>? val "tag") ==. val (Just "Bar")
, veryUnsafeCoerceSqlExprValue (g (x ->? val "contents"))
)
]
nothing
As you can see, we compile a pattern match down to a CASE
statement in SQL, and we use veryUnsafeCoerceSqlExprValue
under the hood, once we have determined the tag
in the JSON representation of the value.
There is more we could do here, such as generating SQL code corresponding to the data constructor functions themselves, but for our purposes, this is enough right now.
Related posts
RSS · Github