Designing Stratified Type Theory

I’ve wanted to write an informal (but technical!) post¹ about my current research in progress on Stratified Type Theory (StraTT), but we’ve been occupied with writing up a paper draft for submission to CPP 2024, then writing a talk for NJPLS Nov 2023, then being rejected from CPP, then I’ve just been randomly distracted for two weeks but I’m so back now.

That paper draft along with the supplementary material will have all the details, but I’ve decided that what I want to focus on in this post is all the other variations on the system we’ve tried that are either inconsistent or less expressive. This means I won’t cover a lot of motivation or examples (still a work in progress), or mention any metatheory unless where relevant; those can be found in the paper. Admittedly, these are mostly notes for myself, and I go at a pace that sort of assumes enough familiarity with the system to be able to verify well-typedness mentally, but this might be interesting to someone else too.

Contents

Type in type
Cumulativity
Syntactic level annotations
Displacement
Floating (nondependent) functions
- But why floating functions?
- When do functions float?
Restriction lemma
Relaxed levels
- Expressivity and inexpressivity
Level polymorphism
What's in the paper?
Summary

The purpose of StraTT is to introduce a different way of syntactically dealing with type universes, and the premise is to take a predicative type theory with a universe hierarchy, and instead of stratifying universes by levels, you instead stratify typing judgements themselves, a strategy inspired by Stratified System F.² This means level annotations appear in the shape of the judgement $\Gamma \vdash a \mathbin{:}^{j} A$ , where $a$ is a term, $A$ is a type, $j$ is a level, and $\Gamma$ is a list of declarations $x \mathbin{:}^{j} A$ .

Here are the rules for functions.

\frac{ \Gamma \vdash A \mathbin{:}^{j} \star \quad \Gamma, x \mathbin{:}^{j} A \vdash B \mathbin{:}^{k} \star \quad j < k }{ \Gamma \vdash \Pi x \mathbin{:}^{j} A \mathpunct{.} B \mathbin{:}^{k} \star } \quad \frac{ \Gamma, x \mathbin{:}^{j} A \vdash b \mathbin{:}^{k} B \quad j < k }{ \Gamma \vdash \lambda x \mathpunct{.} b \mathbin{:}^{k} \Pi x \mathbin{:}^{j} A \mathpunct{.} B } \quad \frac{ \Gamma \vdash b \mathbin{:}^{k} \Pi x \mathbin{:}^{j} A \mathpunct{.} B \quad \Gamma \vdash a \mathbin{:}^{j} A }{ \Gamma \vdash b \; a \mathbin{:}^{k} B[x \mapsto a] }

Function introduction and elimination are as expected, just with level annotations sprinkled in; the main difference is in the dependent function type. Just as the stratification of universes into a hierarchy serves to predicativize function types, so too does stratification of judgements, here done explicitly through the constraint $j < k$ . This means that if you have a function of type $\Pi x \mathbin{:}^{j} \star \mathpunct{.} B$ at level $k$ , you can’t pass that same type to the function as the type argument, since its level is too big.

Type in type

Because universes no longer have levels, we do in fact have a type-in-type rule $\Gamma \vdash \star \mathbin{:}^{j} \star$ for any $j$ . But this is okay!³ The judgement stratification prevents the kind of self-referential tricks that type-theoretic paradoxes typically take advantage of. The simplest such paradox is Hurkens’ paradox, which is still quite complicated, but fundamentally involves the following type.

\mathsf{\textcolor{#bf616a}{U}} \mathbin{:} \star^{1} \coloneqq \Pi X \mathbin{:}^{0} \star \mathpunct{.} ((X \to \star) \to \star) \to X) \to ((X \to \star) \to \star)

For the paradox to work, the type argument of a function of type $\mathsf{\textcolor{#bf616a}{U}}$ needs to be instantiated with $\mathsf{\textcolor{#bf616a}{U}}$ itself, but stratification prevents us from doing that, since $1 \nleq 0$ .

Cumulativity

Cumulativity is both normal to want and possible to achieve. There are two possible variations to achieve it: one adds cumulativity to the variable rule and leaves conversion alone.

\frac{ x \mathbin{:}^{j} A \in \Gamma \quad j \le k }{ \Gamma \vdash x \mathbin{:}^{k} A } \quad \frac{ \Gamma \vdash a \mathbin{:}^{k} A \quad A \equiv B }{ \Gamma \vdash a \mathbin{:}^{k} B }

Alternatively, the variable rule can be left alone, and cumulativity integrated into the conversion rule.

\frac{ x \mathbin{:}^{j} A \in \Gamma }{ \Gamma \vdash x \mathbin{:}^{j} A } \quad \frac{ \Gamma \vdash a \mathbin{:}^{j} A \quad A \equiv B \quad j \leq k }{ \Gamma \vdash a \mathbin{:}^{k} B }

Either set is admissible in terms of the other. I’m not going to tell you which one I’ve picked.

Syntactic level annotations

Level annotations are tedious and bothersome. Can we omit them from function types? The answer is no. Doing so allows us to derive exactly the inconsistency we set out to avoid. Suppose our function type domains aren’t annotated with levels, and let $u \mathbin{:}^{1} \mathsf{\textcolor{#bf616a}{U}}$ . Then by cumulativity, we can raise its level, then apply it to $\mathsf{\textcolor{#bf616a}{U}}$ .

\frac{ u \mathbin{:}^{1} \mathsf{\textcolor{#bf616a}{U}} \vdash u \mathbin{:}^{2} \Pi X \mathbin{:} \star \mathpunct{.} ((X \to \star) \to \star) \to X) \to ((X \to \star) \to \star) \quad \vdash \mathsf{\textcolor{#bf616a}{U}} \mathbin{:}^{1} \star }{ u \mathbin{:}^{1} \mathsf{\textcolor{#bf616a}{U}} \vdash u \; \mathsf{\textcolor{#bf616a}{U}} \mathbin{:}^{2} (((\mathsf{\textcolor{#bf616a}{U}} \to \star) \to \star) \to \mathsf{\textcolor{#bf616a}{U}}) \to ((\mathsf{\textcolor{#bf616a}{U}} \to \star) \to \star) }

The strict level constraint is still there, since the level of $\mathsf{\textcolor{#bf616a}{U}}$ remains strictly smaller than that of $u$ . But without the annotation, the allowed level of the domain can rise as high as possible, yet still within the constraint, via cumulativity.

Displacement

The formalism of StraTT also supports a global context $\Delta$ which consists of a list of global definitions $x \mathbin{:}^{j} A \coloneqq a$ , where $x$ is a constant of type $A$ and body $a$ at level $j$ . Separately from cumulativity, we have displacement, which is based on Conor McBride’s notion of “crude-but-effective stratification”:⁴ global definitions are defined with fixed, constant levels, and then uniformly incremented as needed. This provides a minimalist degree of code reusability across levels without actually having to introduce any sort of level polymorphism. Formally, we have the following rules for displaced constants and their reduction behaviour that integrates both displacement and cumulativity.

\frac{ x \mathbin{:}^{j} A \coloneqq a \in \Delta \quad i + j \leq k }{ \Delta; \Gamma \vdash x^{i} \mathbin{:}^{k} A^{+i} } \quad \frac{ x \mathbin{:}^{j} A \coloneqq a \in \Delta }{ \Delta \vdash x^{i} \rightsquigarrow a^{+i} }

The metafunction ${\cdot}^{+i}$ recursively adds $i$ to all levels and displacements in a term; below are the two cases where the increment actually has an effect.

\begin{align*} (\Pi x \mathbin{:}^{j} A. B)^{+i} &≝ \Pi x \mathbin{:}^{i + j} A^{+i}. B^{+i} \\ (x^{j})^{+i} &≝ x^{i + j} \end{align*}

As an example of where displacement is useful, we can define a function that takes an argument $u$ of the displaced type $\mathsf{\textcolor{#bf616a}{U}}^{1}$ , which now takes a type argument at level $1$ , so $u \; \mathsf{\textcolor{#bf616a}{U}}$ is then well typed.

Floating (nondependent) functions

You’ll notice that I’ve been writing nondependent functions with arrows but without any level annotations. This is because in StraTT, there’s a separate syntactic construct for nondependent functions which behaves just like functions but no longer has the strict stratification restriction.

\frac{ \Gamma \vdash A \mathbin{:}^{k} \star \quad \Gamma \vdash B \mathbin{:}^{k} \star }{ \Gamma \vdash A \to B \mathbin{:}^{k} \star } \quad \frac{ \Gamma, x \mathbin{:}^{k} A \vdash b \mathbin{:}^{k} B }{ \Gamma \vdash \lambda x \mathpunct{.} b \mathbin{:}^{k} A \to B } \quad \frac{ \Gamma \vdash b \mathbin{:}^{k} A \to B \quad \Gamma \vdash a \mathbin{:}^{k} A }{ \Gamma \vdash b \; a \mathbin{:}^{k} B }

Removing the restriction is okay for nondependent function types, because there’s no risk of dependently instantiating a type variable in the type with the type itself.

These are called floating functions because rather than being fixed, the level of the domain of the function type “floats” with the level of overall function type (imagine a buoy bobbing along as the water rises and falls). Specifically, cumulativity lets us derive the following judgement.

\frac{ f \mathbin{:}^{j} A \to B \vdash f \mathbin{:}^{j} A \to B \quad j \leq k }{ f \mathbin{:}^{j} A \to B \vdash f \mathbin{:}^{k} A \to B }

Unusually, if we start with a function $f$ at level $j$ that takes some $A$ at level $j$ , we can use $f$ at level $k$ as if it now takes some $A$ at a higher level $k$ . The floating function $f$ is covariant in the domain with respect to the levels! Typically functions are contravariant or invariant in their domains with respect to the ambient ordering on types, universes, levels, etc.⁵

But why floating functions?

Designing StraTT means not just designing for consistency but for expressivity too: fundamentally revamping how we think about universes is hardly useful if we can’t also express the same things we could in the first place.

First, here’s an example of some Agda code involving self-applications of identity functions in ways such that universe levels are forced to go up to in a nontrivial manner (i.e. not just applying bigger and bigger universes to something). Although I use universe polymorphism to define $\mathsf{\textcolor{#bf616a}{ID}}$ , it isn’t strictly necessary, since all of its uses are inlineable without issue.

{-# OPTIONS --cumulativity #-}

open import Level

ID : ∀ ℓ → Set (suc ℓ)
ID ℓ = (X : Set ℓ) → X → X

𝟘 = zero
𝟙 = suc zero
𝟚 = suc 𝟙
𝟛 = suc 𝟚

-- id id
idid1 : ID 𝟙 → ID 𝟘
idid1 id = id (ID 𝟘) (λ x → id x)

-- id (λid. id id) id
idid2 : ID 𝟚 → ID 𝟘
idid2 id = id (ID 𝟙 → ID 𝟘) idid1 (λ x → id x)

-- id (λid. id (λid. id id) id) id
idid3 : ID 𝟛 → ID 𝟘
idid3 id = id (ID 𝟚 → ID 𝟘) idid2 (λ x → id x)

In $\mathsf{\textcolor{#bf616a}{idid1}}$ , to apply $\mathit{id}$ to itself, its type argument needs to be instantiated with the type of $\mathit{id}$ , meaning that the level of the $\mathit{id}$ on the left needs to be one larger, and the $\mathit{id}$ on the right needs to be eta-expanded to fit the resulting type with the smaller level. Repeatedly applying self-applications increases the level by one each time. I don’t think anything else I can say will be useful; this is one of those things where you have to stare at the code until it type checks in your brain (or just dump it into the type checker).

In the name of expressivity, we would like these same definitions to be typeable in StraTT. Suppose we didn’t have floating functions, so every function type needs to be dependent. This is the first definition that we hope to type check.

\begin{aligned} idid1 & :^{3} Π i d :^{2} (Π X :^{1} ⋆ . Π x :^{1} X . X) . (Π X :^{0} ⋆ . Π x :^{0} X . X) \\ ≔ λ i d . i d (Π X :^{0} ⋆ . Π x :^{0} X . X) (λ X x . i d X x) \end{aligned}

The problem is that while $\mathit{id}$ expects its second argument to be at level $1$ , the actual second argument contains $\mathit{id}$ itself, which is at level $2$ , so such a self-application could never fit! And by stratification, the level of the argument has to be strictly smaller than the level of the overall function, so there’s no annotation we could fiddle about with to make $\mathit{id}$ fit in itself.

What floating functions grant us is the ability to do away with stratification when we don’t need it. The level of a nondependent argument to a function can be as large as the level of the function itself, which is exactly what we need to type check this first definition that now uses floating functions.

\begin{align*} \mathsf{\textcolor{#bf616a}{idid1}} &\mathbin{:}^{2} (\Pi X \mathbin{:}^{1} \star \mathpunct{.} X \to X) \to (\Pi X \mathbin{:}^{0} \star \mathpunct{.} X \to X) \\ &\coloneqq \lambda \mathit{id} \mathpunct{.} \mathit{id} \; (\Pi X \mathbin{:}^{0} \mathpunct{.} \star X \to X) \; (\lambda X \mathpunct{.} \mathit{id} \; X) \end{align*}

The corresponding definitions for $\mathsf{\textcolor{#bf616a}{idid2}}$ and $\mathsf{\textcolor{#bf616a}{idid3}}$ will successfully type check too; exercise for the reader.

When do functions float?

I’ve made it sound like all nondependent functions can be made to float, but unfortunately this isn’t always true. If the function argument itself is used in a nondependent function domain, then that argument is forced to be fixed, too. Consider for example the following predicate that states that the given type is a mere proposition, i.e. that all of its inhabitants are indistinguishable.

\begin{align*} \mathsf{\textcolor{#bf616a}{isProp}} &\mathbin{:}^1 \Pi X \mathbin{:}^0 \star \mathpunct{.} \star \\ &\coloneqq \lambda X \mathpunct{.} \Pi x \mathbin{:}^0 X \mathpunct{.} \Pi y \mathbin{:}^0 X \mathpunct{.} \Pi P \mathbin{:}^0 X \to \star \mathpunct{.} P \; x \to P \; y \end{align*}

If $\mathsf{\textcolor{#bf616a}{isProp}}$ were instead assigned the floating function type $\star \to \star$ at level $1$ , the type argument $X$ being at level $1$ would force the level of $P$ to also be $1$ , which would force the overall definition to be at level $2$ , which would then make $X$ be at level $2$ , and so on.

Not only are not all nondependent functions necessarily floating, not all functions that could be floating necessarily need to be nondependent, either. I’m unsure of how to verify this, but I don’t see why function types like the identity function type $(X \mathbin{:} \star) \to X \to X$ can’t float; I can’t imagine that being exploited to derive an inconsistency. The fixed/floating dichotomy appears to be independent of the dependent/nondependent dichotomy, and matching them up only an approximation.

Restriction lemma

Before I move on to the next design decision, I want to state and briefly discuss an important lemma used in our metatheory. We’ve proven type safety (that is, progress and preservation lemmas) for StraTT, and the proof relies on a restriction lemma.

Definition: Given a context $\Gamma$ and a level $j$ , the restriction $\lceil\Gamma\rceil_{j}$ discards all declarations in $\Gamma$ of level strictly greater than $j$ .

Lemma [Restriction]: If $\vdash \Gamma$ and $\Gamma \vdash a \mathbin{:}^{j} A$ , then $\vdash \lceil\Gamma\rceil_{j}$ and $\lceil\Gamma\rceil_{j} \vdash a \mathbin{:}^{j} A$ .

For the lemma to hold, no derivation of $\Gamma \vdash a \mathbin{:}^{j} A$ can have premises whose level is strictly greater than $j$ ; otherwise, restriction will discard those necessary pieces. This lemma is crucial in proving specifically the floating function case of preservation; if we didn’t have floating functions, the lemma isn’t necessary.

Relaxed levels

As it stands, StraTT enforces that the level of a term’s type is exactly the level of the term. In other words, the following regularity lemma is provable.

Lemma [Regularity]: If $\Gamma \vdash a \mathbin{:}^{j} A$ , then $\Gamma \vdash A \mathbin{:}^{j} \star$ .

But what if we relaxed this requirement?

Lemma [Regularity (relaxed)]: If $\Gamma \vdash a \mathbin{:}^{j} A$ , then there is some $k \geq j$ such that $\Gamma \vdash A \mathbin{:}^{k} \star$ .

In such a new relaxed StraTT (RaTT), the restriction lemma is immediately violated, since $\Gamma ≝ A \mathbin{:}^{1} \star, x \mathbin{:}^{0} A$ is a well-formed context, but $\lceil\Gamma\rceil_{0} = x \mathbin{:}^{0} A$ isn’t even well scoped. So a system with relaxed levels that actually makes use of the relaxation can no longer accommodate floating functions because type safety won’t hold.

However, the dependent functions of RaTT are more expressive: the $\mathit{id}$ self-application can be typed using dependent functions alone!⁶ First, let’s pin down the new rules for our functions.

\frac{ \Gamma \vdash A \mathbin{:}^{k} \star \quad \Gamma, x \mathbin{:}^{j} A \vdash B \mathbin{:}^{k} \star \quad j < k }{ \Gamma \vdash \Pi x \mathbin{:}^{j} A \mathpunct{.} B \mathbin{:}^{k} \star } \quad \frac{ \Gamma, x \mathbin{:}^{j} A \vdash b \mathbin{:}^{k} B }{ \Gamma \vdash \lambda x \mathpunct{.} b \mathbin{:}^{k} \Pi x \mathbin{:}^{j} A \mathpunct{.} B } \quad \frac{ \Gamma \vdash b \mathbin{:}^{k} \Pi x \mathbin{:}^{j} A \mathpunct{.} B \quad \Gamma \vdash a \mathbin{:}^{j} A }{ \Gamma \vdash b \; a \mathbin{:}^{k} B[x \mapsto a] }

Two things have changed:

The domain type $A$ can now be typed at the larger level $k$ rather than at the argument level $j$ ; and
There’s no longer a restriction $j < k$ between the argument level $j$ and the function body level $k$ .

Since the premise $j < k$ still exists in the type formation rule, we can be assured that a term of type $\mathsf{\textcolor{#bf616a}{U}}$ still can’t be applied to $\mathsf{\textcolor{#bf616a}{U}}$ itself and that we still probably have consistency.⁷ Meanwhile, the absence of $j < k$ in the function introduction rule allows us to inhabit, for instance, the identity function type $\Pi X \mathbin{:}^{1} \star \mathpunct{.} \Pi x \mathbin{:}^{0} X \mathpunct{.} X$ , where the level annotations no longer match.

Expressivity and inexpressivity

These rules let us assign levels to the $\mathit{id}$ self-application as follows.

\begin{align*} \mathsf{\textcolor{#bf616a}{idid1}} &\mathbin{:}^{0} \Pi \mathit{id} \mathbin{:}^{0} (\Pi X \mathbin{:}^{1} \star \mathpunct{.} \Pi x \mathbin{:}^{0} X \mathpunct{.} X) \mathpunct{.} (\Pi X \mathbin{:}^{0} \star \mathpunct{.} \Pi x \mathbin{:}^{0} X \mathpunct{.} X) \\ &\coloneqq \lambda \mathit{id} \mathpunct{.} \mathit{id} \; (\Pi X \mathbin{:}^{0} \star \mathpunct{.} \Pi x \mathbin{:}^{0} X \mathpunct{.} X) \; (\lambda X \; x \mathpunct{.} \mathit{id} \; X \; x) \end{align*}

Although the type of $\mathit{id}$ must live at level $2$ , the $\mathit{id}$ term can be assigned a lower level $0$ . This permits the self-application to go through, since the second argument of $\mathit{id}$ demands a term at level $0$ . Unusually, despite the second $\mathit{id}$ being applied to a type at level $1$ , the overall level of the second argument is still $0$ because quantification over arguments at higher levels is permitted.

Despite the apparently added expressivity of these new dependent functions, unfortunately they’re still not as expressive as actually having floating functions. This was something I discovered while trying to type check Hurkens’ paradox and found that fewer definitions were typeable. I’ve attempted to isolate the issue a bit by revealing a similar problem when working with CPS types. Let’s first look at some definitions in the original StraTT with floating functions.

\begin{array}{r l l} \mathsf{\textcolor{#bf616a}{CPS}} &\mathbin{:}^{1} \star \to \star &\coloneqq \lambda A \mathpunct{.} \Pi X \mathbin{:}^{0} \star. (A \to X) \to X \\ \mathsf{\textcolor{#bf616a}{return}} &\mathbin{:}^{1} \Pi A \mathbin{:}^{0} \star \mathpunct{.} A \to \mathsf{\textcolor{#bf616a}{CPS}} \; A &\coloneqq \lambda A \; a \; X \; f. f \; a \\ \mathsf{\textcolor{#bf616a}{run}} &\mathbin{:}^{1} \Pi A \mathbin{:}^{0} \star \mathpunct{.} \mathsf{\textcolor{#bf616a}{CPS}} \; A \to A &\coloneqq \lambda A \; f \mathpunct{.} f \; A \; (\lambda a \mathpunct{.} a) \\ \mathsf{\textcolor{#bf616a}{idCPS}} &\mathbin{:}^{2} \Pi A \mathbin{:}^{0} \star \mathpunct{.} \mathsf{\textcolor{#bf616a}{CPS}}^{1} \; A \to \mathsf{\textcolor{#bf616a}{CPS}} \; A &\coloneqq \lambda A \; f. f \; (\mathsf{\textcolor{#bf616a}{CPS}} \; A) \; (\mathsf{\textcolor{#bf616a}{return}} \; A) \\ \mathsf{\textcolor{#bf616a}{runReturn}} &\mathbin{:}^{2} \Pi A \mathbin{:}^{0} \star \mathpunct{.} \mathsf{\textcolor{#bf616a}{CPS}}^{1} \; A \to \star &\coloneqq \lambda A \; f \mathpunct{.} \mathsf{\textcolor{#bf616a}{run}} \; A \; (\mathsf{\textcolor{#bf616a}{idCPS}} \; A \; f) = \mathsf{\textcolor{#bf616a}{run}}^{1} \; A \; f \end{array}

$\mathsf{\textcolor{#bf616a}{CPS}}$ translates a type to its answer-polymorphic CPS form, and $\mathsf{\textcolor{#bf616a}{return}}$ translates a term into CPS. $\mathsf{\textcolor{#bf616a}{run}}$ does the opposite and runs the computation with the identity continuation to yield a term of the original type. $\mathsf{\textcolor{#bf616a}{runReturn}}$ is a proposition we might want to prove about a given type and a computation of that type in CPS: passing $\mathsf{\textcolor{#bf616a}{return}}$ as the continuation for our computation (as in $\mathsf{\textcolor{#bf616a}{idCPS}}$ ) to get another computation should, when run, yield the exact same result as running the computation directly.

By careful inspection,⁸ everything type checks. Displacements needed are in $\mathsf{\textcolor{#bf616a}{idCPS}}$ and $\mathsf{\textcolor{#bf616a}{runReturn}}$ on the type of the computation $f$ . This displacement raises the level of the answer type argument of $f$ so that the type can be instantiated with $\mathsf{\textcolor{#bf616a}{CPS}} \; A$ . Consequently, we also need to displace the $\mathsf{\textcolor{#bf616a}{run}}^{1}$ that runs the displaced computation on the right-hand side of $\mathsf{\textcolor{#bf616a}{runReturn}}$ . Meanwhile, left-hand $\mathsf{\textcolor{#bf616a}{run}}$ shouldn’t be displaced because it runs the undisplaced computation returned by $\mathsf{\textcolor{#bf616a}{idCPS}}$ .

Now let’s try to do the same in the new RaTT without floating functions. I’ll only write down the definition of $\mathsf{\textcolor{#bf616a}{CPS}}$ and $\mathsf{\textcolor{#bf616a}{idCPS}}$ , and inline $\mathsf{\textcolor{#bf616a}{return}}$ in the latter.

\begin{aligned} CPS & :^{1} Π A :^{0} ⋆ . ⋆ & ≔ λ A . Π X :^{0} ⋆ . Π f :^{0} (Π a :^{0} A . X) . X \\ idCPS & :^{0} Π A :^{0} ⋆ . Π f :^{0} {CPS}^{1} A . CPS A & ≔ λ A f . f (CPS A) (λ a X g . g a) \end{aligned}

This is a little more difficult to decipher, so let’s look at what the types of the different pieces of $\mathsf{\textcolor{#bf616a}{idCPS}}$ are.

$f$ has type $\Pi X \mathbin{:}^{1} \star \mathpunct{.} \Pi f \mathbin{:}^{1} (\Pi a \mathbin{:}^{1} A \mathpunct{.} X) \mathpunct{.} X$ from displacing $\mathsf{\textcolor{#bf616a}{CPS}}$ .
The second argument of $f$ must have type $\Pi a \mathbin{:}^{1} A \mathpunct{.} \mathsf{\textcolor{#bf616a}{CPS}} \; A$ from instantiating $X$ with $\mathsf{\textcolor{#bf616a}{CPS}} \; A$ . This is also the type of $\mathsf{\textcolor{#bf616a}{return}}$ .
The types of the arguments $a$ , $X$ , and $g$ of the second argument $\lambda a \; X \; g \mathpunct{.} g \; a$ are $a \mathbin{:}^{1} A$ , $X \mathbin{:}^{0} \star$ , and $g \mathbin{:}^{0} \Pi a \mathbin{:}^{0} A \mathpunct{.} X$ from expanding $\mathsf{\textcolor{#bf616a}{CPS}} \; A$ .
$g \; a$ is ill typed because $a$ has level $1$ while $g$ takes an $A$ at level $0$ .

To fix this, we might try to increase the level annotation on $a$ in the definition of $\mathsf{\textcolor{#bf616a}{CPS}}$ so that $g$ can accommodate $a$ . But doing so would also increase the level annotation on $a$ in the displaced type of $f$ to $2$ , meaning that $a$ is still too large to fit into $g$ . The uniformity of the displacement mechanism means that if everything is annotated with a level, then everything moves up at the same time. Previously, floating functions allowed us to essentially ignore increasing levels due to displacement, but also independently increase them via cumulativity.

Of course, we could simply define a second $\mathsf{\textcolor{#bf616a}{CPS}}$ definition with a different level, i.e. $\Pi X \mathbin{:}^{j} \star \mathpunct{.} \Pi f \mathbin{:}^{0} (\Pi a \mathbin{:}^{0} A \mathpunct{.} X) \mathpunct{.} X$ for the level $j$ that we need, but then to avoid code duplication, we’re back at needing some form of level polymorphism over $j$ .

Level polymorphism

So far, I’ve treated level polymorphism as if it were something unpleasant to deal with and difficult to handle, and this is because level polymorphism is unpleasant and difficult. On the useability side, I’ve found that level polymorphism in Agda muddies the intent of the code I want to write and produces incomprehensible type errors while I’m writing it, and I hear that universe polymorphism in Coq is a similar beast. Of course, StraTT is still far away from being a useable tool, but in the absence of more complex level polymorphism, we can plan and design for more friendly elaboration-side features such as level inference.

On the technical side, it’s unclear how you might assign a type and level to something like $\forall k \mathpunct{.} \Pi x \mathbin{:}^{k} A \mathpunct{.} B$ , since $x$ essentially quantifies over derivations at all levels. We would also need bounded quantification from both ends for level instantiations to be valid. For instance, $\Pi x \mathbin{:}^{k} (\Pi y \mathbin{:}^{2} A \mathpunct{.} B \; y) \mathpunct{.} C \; x$ requires that the level $k$ of the type of the domain be strictly greater than $2$ , so the quantification is $\forall k > 2$ ; and flipping the levels, $\Pi x \mathbin{:}^{2} (\Pi y \mathbin{:}^{k} A \mathpunct{.} B \; y) \mathpunct{.} C \; x$ would require $\forall k < 2$ . While assigning a level to a quantification bounded above is easy (the level of the second type above can be $3$ ), assigning a level to a quantification bounded below is as unclear as assigning one to an unbounded quantification.

At this point, we could either start using ordinals in general for our levels and always require upper-bounded quantification whose upper bound could be a limit ordinal, or we can restrict level quantification to prenex polymorphism at definitions, which is roughly how Coq’s universe polymorphism works, only implicitly, and to me the more reasonable option:

We believe there is limited need for […] higher-ranked universe polymorphism for a cumulative universe hierarchy.
― Favonia et al. (2023)⁹

With the second option, we have to be careful not to repeat the same mistakes as Coq’s universe polymorphism. There, by default, every single (implicit) universe level in a definition is quantified over, and every use of a definition generates fresh level variables for each quantified level, so it’s very easy (albeit somewhat artificial) to end up with exponentially-many level variables to handle relative to the number of definitions. On the other hand, explicitly quantifying over level variables is tedious if you must instantiate them yourself, and it’s tricky to predict which ones you really do want to quantify over.

Level polymorphism is clearly more expressive, since in a definition of type $\forall i, j \mathpunct{.} \Pi x \mathbin{:}^{i} A \mathpunct{.} \Pi y \mathbin{:}^{j} B \mathpunct{.} C \; x \; y$ for instance, you can instantiate $i$ and $j$ independently, whereas displacement forces you to always displace both by the same amount. But so far, it’s unclear to me in what scenarios this feature would be absolutely necessary and not manageable with just floating functions and futzing about with the level annotations a little.

What’s in the paper?

The variant of StraTT that the paper covers is the one with stratified dependent functions, nondependent floating functions, and displacement. We’ve proven type safety, mechanized a model of StraTT without floating functions (but not the interpretation from the syntax to the semantics), and implemented StraTT extended with datatypes and level/displacement inference.

The paper doesn’t yet cover any discussion of RaTT with relaxed level constraints. I’m still deliberating whether it would be worthwhile to update my mechanized model first before writing about it, just to show that it is a variant that deserves serious consideration, even if I’ll likely discard it at the end. The paper doesn’t cover any discussion on level polymorphism either, and if I don’t have any more concrete results, I probably won’t go on to include it.

It doesn’t have to be for this current paper draft, but I’d like to have a bit more on level inference in terms of proving soundness and completeness. Soundness should be evident (we toss all of the constraints into an SMT solver), but completeness would probably require some ordering on different level annotations of the same term such that well-typedness of a “smaller” annotation set implies well-typedness of a “larger” annotation set, formalizing the inference algorithm, then showing that the algorithm always produces the “smallest” annotation set.

Summary

Typing derivations are stratified by levels
Dependent functions quantify over strictly lower types
Floating functions use cumulativity to raise domain levels
Displacement enables code reuse through uniformly incrementing levels
Allowing type levels to be larger than term levels breaks type safety for floating functions
Dependent functions and displacement alone are less expressive than also having floating functions
Level polymorphism is still an open question that I’d rather not include

A draft of this post was first published on Cohost. ↩
Daniel Leivant. Stratified polymorphism. LICS 1989. doi:10.1109/LICS.1989.39157. ↩
For the semantically-inclined, it is possible to construct a model of these universes, as I’ve done in this Agda model, which also sketches out how the rest of consistency might be proven. ↩
Conor McBride. Crude but Effective Stratification. 2011. https://mazzo.li/epilogue/index.html%3Fp=857.html. ↩
This domain covariance prevents us from extending the above model with floating function types, because this behaviour semantically… well, kind of doesn’t make sense. So consistency with floating functions is an open problem. ↩
Thanks to Yiyun Liu for pointing this out about the system. ↩
The Agda model could probably be tweaked to accommodate this new system; I just haven’t done it. ↩
Or by using the implementation to type check for me, which is what I did. ↩
Favonia, Carlo Angiuli, Reed Mullanix. An Order-Theoretic Analysis of Universe Polymorphism. POPL 2023. doi:10.1145/3571250. ↩

type theory universe levels