Higher-order abstract syntax (HOAS) uses binding in the ambient metalanguage
to represent binding in the object language.
For example, if we define Terms of the object language as a type in the metalanguage,
we might give functions of the lambda calculus (our object language) the type
Lam : (Term → Term) → Term.
The identity function in the object language is then represented
by this constructor using the identity function in the metalanguage,
Lam (λx. x) : Term.
We could represent all Terms of the lambda calculus as the following data type.
data Term : Type where
Lam : (Term → Term) → Term
App : Term → Term → Term
The problem with this representation is that there exist Terms
that don’t correspond to any lambda calculus terms
because the metalanguage is free to inspect the syntax of a Term.
As a concrete example, the body of Lam (λx. case x of Lam ⇒ t₁ | App ⇒ t₂)
may return different Terms depending on the argument x,
which isn’t possible in the lambda calculus.
We want to restrict Term so that only valid lambda calculus terms are representable.
Existing solutions tend to use some form of type polymorphism
to enforce parametricity of the Term argument
and to prevent it from being inspected.
Could we use dependency tracking levels to achieve the same effects?
Dependency tracking
Dependency tracking type systems assign a dependency level to each term along with its type. These levels form a lattice; for our purposes, we only need the two-point lattice , read as low and high levels. In the context of information flow, they represent low- and high-security clearance, where a low-security observer (term) can’t inspect (use) a high-security variable. For instance, a function of type and level may not use its first high-level variable, since it has to return something low level. is a valid term of this type, while is not. There is often also a boxing construct that allows passing high-level terms around in a low-level environment, and its unboxing eliminator prevents inspecting what’s in the box when the observer doesn’t have clearance.
The level restriction generalizes to (almost†) all eliminators,
where the eliminator’s observer level must be at least as high as what it’s inspecting.
Coming back to Term, we could try to prevent the function argument from being inspected
by marking it as high and marking the function body itself as low.
Lam : (Term @ H → Term) @ L → Term @ L
Now we can’t type any Term of the form Lam (λx. case x of ...). Great!
But we can’t type the identity function Lam (λx. x) either,
because we can’t return a high-level term in a low-level function.
Maybe instead of returning a bare Term, we return a boxed term,
which would permit Lam (λx. box x).
Lam : (Term @ H → Tᴴ Term) @ L → Term @ L
But this is insufficient, because we can just push the case expression into the box,
as in Lam (λx. box (case x of ...)).
The observer level inside the box is high, which is what permits the inspection,
while the observer level outside the box in the function body remains low.
Such a term again doesn’t represent any lambda calculus term.
What we really want is to prohibit high-level computations, which includes inspection, but permit high-level values, which lets us pass arguments around untouched. We therefore turn to dependency tracking on call-by-push-value (CBPV), which is a type system that distinguishes values from computations.
CBPV with dependency levels
In CBPV, there are value types and computation types . Values embed into computations via the returner type operator , which represents returning some value. Computations embed into values via the thunk type operator , which represents suspending a computation. The only other type we define for now is the function type, which is a computation type that takes some value and produces some computation.
To this standard presentation of CBPV, we annotate returner and thunk types with a level representing the level of the value being returned or the computation being thunked, respectively, along with annotating function arguments with a level. The syntax of types, contexts, and terms is given below; usually terms also take some level annotations, but for this presentation we won’t need them.
The rules for the typing judgements on values and computations are given below. In terms of the level annotations and constraints, returners/thunks and forcing resemble the earlier boxing and unboxing constructs. Functions and applications keep track of the levels and ensure they match but the only place where level constraints are enforced is in the variable rule, where an observer may use a variable only if its level is at least as high as the variable’s.
The level constraint in the Let rule follows the same principles as the other eliminators: an observer at level can only inspect a returned value at level if its clearance is high enough. The level of the actual returned value is used in the context when typing the body of the let expression, again using the variable rule to determine whether the bound value can actually be used in the body.
HOAS in CBPV with levels
All of this to say that we can then define Lam in HOAS
to take a high-level argument and permit it to return a high-level Term,
but not compute a high-level Term.
The data type might then look like the following.
data Term : Type where
Lam : Uᴸ (Term @ H → Fᴴ Term) @ ℓ → Term @ ℓ
App : Term @ ℓ → Term @ ℓ → Term @ ℓ
The constructors are level-polymorphic where possible for greater flexibility.
Now Lam {λx. case x of ...} can’t be typed due to level mismatch, as we expect,
and neither can Lam {λx. ret {case x of ...}}, as it has the wrong type.
Meanwhile, we are able to express valid lambda calculus terms.
| Lambda term | Representation Term |
|---|---|
Lam {λx. ret x} |
|
Lam {λx. ret (Lam {λy. ret x})} |
|
Lam {λx. ret (Lam {λy. ret (Lam {λz. ret (App (App x z) (App y z))})})} |
|
App (Lam {λx. ret (App x x)}) (Lam {λx. ret (App x x)}) |
We can also write functions over Terms,
such as an evaluation function for the lambda calculus
(this one in particular is call-by-name).
The levels involved are now high,
since we need to inspect the high Term returned by lambdas.
data Result : Type where
Lam : Uᴸ (Term @ H → Fᴴ Term) @ ℓ → Term @ ℓ
eval : Term @ H → Fᴸ Result @ H
eval (Lam h) = ret (Lam h)
eval (App b a) =
let x ← eval b in
case x of
Result h ⇒ let y ← h! a in eval y