Higher order abstract syntax
using dependency tracking in CBPV

Higher-order abstract syntax (HOAS) uses binding in the ambient metalanguage to represent binding in the object language. For example, if we define Terms of the object language as a type in the metalanguage, we might give functions of the lambda calculus (our object language) the type Lam : (Term → Term) → Term. The identity function in the object language is then represented by this constructor using the identity function in the metalanguage, Lam (λx. x) : Term. We could represent all Terms of the lambda calculus as the following data type.

data Term : Type where
  Lam : (Term  Term)  Term
  App : Term  Term  Term

The problem with this representation is that there exist Terms that don’t correspond to any lambda calculus terms because the metalanguage is free to inspect the syntax of a Term. As a concrete example, the body of Lam (λx. case x of Lam ⇒ t₁ | App ⇒ t₂) may return different Terms depending on the argument x, which isn’t possible in the lambda calculus. We want to restrict Term so that only valid lambda calculus terms are representable.

Existing solutions tend to use some form of type polymorphism to enforce parametricity of the Term argument and to prevent it from being inspected. Could we use dependency tracking levels to achieve the same effects?

Dependency tracking

Dependency tracking type systems assign a dependency level to each term along with its type. These levels form a lattice; for our purposes, we only need the two-point lattice L<HL < H, read as low and high levels. In the context of information flow, they represent low- and high-security clearance, where a low-security observer (term) can’t inspect (use) a high-security variable. For instance, a function of type and level τ@Hτ@Lτ@L\tau @ H \to \tau @ L \to \tau @ L may not use its first high-level variable, since it has to return something low level. λx.λy.y\lambda x. \lambda y. y is a valid term of this type, while λx.λy.x\lambda x. \lambda y. x is not. There is often also a boxing construct that allows passing high-level terms around in a low-level environment, and its unboxing eliminator prevents inspecting what’s in the box when the observer doesn’t have clearance.

Γt ⁣:τ@Γboxt ⁣:Tτ@Γt ⁣:Tτ@Γunboxt ⁣:τ@\frac{ \Gamma \vdash t \colon \tau @ \ell }{ \Gamma \vdash \mathsf{box} \: t \colon \mathsf{T}^{\ell} \tau @ \ell' } \qquad \frac{ \Gamma \vdash t \colon \mathsf{T}^{\ell} \tau @ \ell' \quad \ell' \leq \ell }{ \Gamma \vdash \mathsf{unbox} \: t \colon \tau @ \ell }

The level restriction generalizes to (almost) all eliminators, where the eliminator’s observer level must be at least as high as what it’s inspecting. Coming back to Term, we could try to prevent the function argument from being inspected by marking it as high and marking the function body itself as low.

Lam : (Term @ H → Term) @ L → Term @ L

Now we can’t type any Term of the form Lam (λx. case x of ...). Great! But we can’t type the identity function Lam (λx. x) either, because we can’t return a high-level term in a low-level function. Maybe instead of returning a bare Term, we return a boxed term, which would permit Lam (λx. box x).

Lam : (Term @ H → Tᴴ Term) @ L → Term @ L

But this is insufficient, because we can just push the case expression into the box, as in Lam (λx. box (case x of ...)). The observer level inside the box is high, which is what permits the inspection, while the observer level outside the box in the function body remains low. Such a term again doesn’t represent any lambda calculus term.

What we really want is to prohibit high-level computations, which includes inspection, but permit high-level values, which lets us pass arguments around untouched. We therefore turn to dependency tracking on call-by-push-value (CBPV), which is a type system that distinguishes values from computations.

CBPV with dependency levels

In CBPV, there are value types AA and computation types BB. Values embed into computations via the returner type operator F\mathsf{F}, which represents returning some value. Computations embed into values via the thunk type operator U\mathsf{U}, which represents suspending a computation. The only other type we define for now is the function type, which is a computation type that takes some value and produces some computation.

To this standard presentation of CBPV, we annotate returner and thunk types with a level \ell representing the level of the value being returned or the computation being thunked, respectively, along with annotating function arguments with a level. The syntax of types, contexts, and terms is given below; usually terms also take some level annotations, but for this presentation we won’t need them.

A=UBB=FAA@BΓ=Γ,x ⁣:A@v,w=x{m}m,n=retvletxninmλx.mmv\begin{align} A &\Coloneqq \mathsf{U}^{\ell} B \\ B &\Coloneqq \mathsf{F}^{\ell} A \mid A @ \ell \to B \\ \Gamma &\Coloneqq \cdot \mid \Gamma, x \colon A @ \ell \\ v, w &\Coloneqq x \mid \{m\} \\ m, n &\Coloneqq \mathsf{ret} \: v \mid \mathsf{let} \: x \leftarrow n \: \mathsf{in} \: m \mid \lambda x. m \mid m \: v \end{align}

The rules for the typing judgements on values Γv ⁣:A\Gamma \vdash v \colon A and computations Γm ⁣:B\Gamma \vdash m \colon B are given below. In terms of the level annotations and constraints, returners/thunks and forcing resemble the earlier boxing and unboxing constructs. Functions and applications keep track of the levels and ensure they match but the only place where level constraints are enforced is in the variable rule, where an observer may use a variable only if its level is at least as high as the variable’s.

x ⁣:A@ΓΓx ⁣:A@VarΓ,x ⁣:A@m ⁣:B@Γλx.m ⁣:A@B@LamΓm ⁣:A@B@Γv ⁣:A@Γmv ⁣:B@App\frac{ x \colon A @ \ell' \in \Gamma \quad \ell' \leq ℓ }{ \Gamma \vdash x \colon A @ \ell } \textsf{Var} \qquad \frac{ \Gamma, x \colon A @ \ell' \vdash m \colon B @ \ell }{ \Gamma \vdash \lambda x. m \colon A @ \ell' \to B @ \ell } \textsf{Lam} \qquad \frac{ \Gamma \vdash m \colon A @ \ell' \to B @ \ell \quad \Gamma \vdash v \colon A @ \ell' }{ \Gamma \vdash m \: v \colon B @ \ell } \textsf{App} Γv ⁣:A@Γretv ⁣:FA@RetΓn ⁣:FA@1Γ,x ⁣:A@m ⁣:B@212Γletxninm ⁣:B@2Let\frac{ \Gamma \vdash v \colon A @ \ell }{ \Gamma \vdash \mathsf{ret} \: v \colon \mathsf{F}^{\ell} A @ \ell' } \textsf{Ret} \qquad \frac{ \Gamma \vdash n \colon \mathsf{F}^{\ell} A @ \ell_1 \quad \Gamma, x \colon A @ \ell \vdash m \colon B @ \ell_2 \quad \ell_1 \leq \ell_2 }{ \Gamma \vdash \mathsf{let} \: x \leftarrow n \: \mathsf{in} \: m \colon B @ \ell_2 } \textsf{Let} Γm ⁣:B@Γ{m} ⁣:UB@ThunkΓv ⁣:UB@Γv! ⁣:B@Force\frac{ \Gamma \vdash m \colon B @ \ell }{ \Gamma \vdash \{m\} \colon \mathsf{U}^{\ell} B @ \ell' } \textsf{Thunk} \qquad \frac{ \Gamma \vdash v \colon \mathsf{U}^{\ell} B @ \ell' \quad \ell' \leq \ell }{ \Gamma \vdash v! \colon B @ \ell } \textsf{Force}

The level constraint in the Let rule follows the same principles as the other eliminators: an observer at level l2l_2 can only inspect a returned value at level l1l_1 if its clearance is high enough. The level of the actual returned value is used in the context when typing the body of the let expression, again using the variable rule to determine whether the bound value can actually be used in the body.

HOAS in CBPV with levels

All of this to say that we can then define Lam in HOAS to take a high-level argument and permit it to return a high-level Term, but not compute a high-level Term. The data type might then look like the following.

data Term : Type where
  Lam : Uᴸ (Term @ H  Fᴴ Term) @   Term @ 
  App : Term @   Term @   Term @ 

The constructors are level-polymorphic where possible for greater flexibility. Now Lam {λx. case x of ...} can’t be typed due to level mismatch, as we expect, and neither can Lam {λx. ret {case x of ...}}, as it has the wrong type. Meanwhile, we are able to express valid lambda calculus terms.

Lambda term Representation Term
I=λx.xI = \lambda x. x Lam {λx. ret x}
K=λx.λy.xK = \lambda x. \lambda y. x Lam {λx. ret (Lam {λy. ret x})}
S=λx.λy.λz.xz(yz)S = \lambda x. \lambda y. \lambda z. x \: z \: (y \: z) Lam {λx. ret (Lam {λy. ret (Lam {λz. ret (App (App x z) (App y z))})})}
Ω=(λx.xx)(λx.xx)\Omega = (\lambda x. x \: x) \: (\lambda x. x \: x) App (Lam {λx. ret (App x x)}) (Lam {λx. ret (App x x)})

We can also write functions over Terms, such as an evaluation function for the lambda calculus (this one in particular is call-by-name). The levels involved are now high, since we need to inspect the high Term returned by lambdas.

data Result : Type where
  Lam : Uᴸ (Term @ H  Fᴴ Term) @   Term @ 

eval : Term @ H  Fᴸ Result @ H
eval (Lam h) = ret (Lam h)
eval (App b a) =
  let x  eval b in
  case x of
    Result h  let y  h! a in eval y