Designing Stratified Type Theory

I’ve wanted to write an informal (but technical!) post1 about my current research in progress on Stratified Type Theory (StraTT), but we’ve been occupied with writing up a paper draft for submission to CPP 2024, then writing a talk for NJPLS Nov 2023, then being rejected from CPP, then I’ve just been randomly distracted for two weeks but I’m so back now.

That paper draft along with the supplementary material will have all the details, but I’ve decided that what I want to focus on in this post is all the other variations on the system we’ve tried that are either inconsistent or less expressive. This means I won’t cover a lot of motivation or examples (still a work in progress), or mention any metatheory unless where relevant; those can be found in the paper. Admittedly, these are mostly notes for myself, and I go at a pace that sort of assumes enough familiarity with the system to be able to verify well-typedness mentally, but this might be interesting to someone else too.


The purpose of StraTT is to introduce a different way of syntactically dealing with type universes, and the premise is to take a predicative type theory with a universe hierarchy, and instead of stratifying universes by levels, you instead stratify typing judgements themselves, a strategy inspired by Stratified System F.2 This means level annotations appear in the shape of the judgement Γa:jA\Gamma \vdash a \mathbin{:}^{j} A, where aa is a term, AA is a type, jj is a level, and Γ\Gamma is a list of declarations x:jAx \mathbin{:}^{j} A.

Here are the rules for functions.

ΓA:jΓ,x:jAB:kj<kΓΠx:jA.B:kΓ,x:jAb:kBj<kΓλx.b:kΠx:jA.BΓb:kΠx:jA.BΓa:jAΓb  a:kB[xa]\frac{ \Gamma \vdash A \mathbin{:}^{j} \star \quad \Gamma, x \mathbin{:}^{j} A \vdash B \mathbin{:}^{k} \star \quad j < k }{ \Gamma \vdash \Pi x \mathbin{:}^{j} A \mathpunct{.} B \mathbin{:}^{k} \star } \quad \frac{ \Gamma, x \mathbin{:}^{j} A \vdash b \mathbin{:}^{k} B \quad j < k }{ \Gamma \vdash \lambda x \mathpunct{.} b \mathbin{:}^{k} \Pi x \mathbin{:}^{j} A \mathpunct{.} B } \quad \frac{ \Gamma \vdash b \mathbin{:}^{k} \Pi x \mathbin{:}^{j} A \mathpunct{.} B \quad \Gamma \vdash a \mathbin{:}^{j} A }{ \Gamma \vdash b \; a \mathbin{:}^{k} B[x \mapsto a] }

Function introduction and elimination are as expected, just with level annotations sprinkled in; the main difference is in the dependent function type. Just as the stratification of universes into a hierarchy serves to predicativize function types, so too does stratification of judgements, here done explicitly through the constraint j<kj < k. This means that if you have a function of type Πx:j.B\Pi x \mathbin{:}^{j} \star \mathpunct{.} B at level kk, you can’t pass that same type to the function as the type argument, since its level is too big.

Type in type

Because universes no longer have levels, we do in fact have a type-in-type rule Γ:j\Gamma \vdash \star \mathbin{:}^{j} \star for any jj. But this is okay!3 The judgement stratification prevents the kind of self-referential tricks that type-theoretic paradoxes typically take advantage of. The simplest such paradox is Hurkens’ paradox, which is still quite complicated, but fundamentally involves the following type.

U:1ΠX:0.((X))X)((X))\mathsf{\textcolor{#bf616a}{U}} \mathbin{:} \star^{1} \coloneqq \Pi X \mathbin{:}^{0} \star \mathpunct{.} ((X \to \star) \to \star) \to X) \to ((X \to \star) \to \star)

For the paradox to work, the type argument of a function of type U\mathsf{\textcolor{#bf616a}{U}} needs to be instantiated with U\mathsf{\textcolor{#bf616a}{U}} itself, but stratification prevents us from doing that, since 101 \nleq 0.


Cumulativity is both normal to want and possible to achieve. There are two possible variations to achieve it: one adds cumulativity to the variable rule and leaves conversion alone.

x:jAΓjkΓx:kAΓa:kAABΓa:kB\frac{ x \mathbin{:}^{j} A \in \Gamma \quad j \le k }{ \Gamma \vdash x \mathbin{:}^{k} A } \quad \frac{ \Gamma \vdash a \mathbin{:}^{k} A \quad A \equiv B }{ \Gamma \vdash a \mathbin{:}^{k} B }

Alternatively, the variable rule can be left alone, and cumulativity integrated into the conversion rule.

x:jAΓΓx:jAΓa:jAABjkΓa:kB\frac{ x \mathbin{:}^{j} A \in \Gamma }{ \Gamma \vdash x \mathbin{:}^{j} A } \quad \frac{ \Gamma \vdash a \mathbin{:}^{j} A \quad A \equiv B \quad j \leq k }{ \Gamma \vdash a \mathbin{:}^{k} B }

Either set is admissible in terms of the other. I’m not going to tell you which one I’ve picked.

Syntactic level annotations

Level annotations are tedious and bothersome. Can we omit them from function types? The answer is no. Doing so allows us to derive exactly the inconsistency we set out to avoid. Suppose our function type domains aren’t annotated with levels, and let u:1Uu \mathbin{:}^{1} \mathsf{\textcolor{#bf616a}{U}}. Then by cumulativity, we can raise its level, then apply it to U\mathsf{\textcolor{#bf616a}{U}}.

u:1Uu:2ΠX:.((X))X)((X))U:1u:1Uu  U:2(((U))U)((U))\frac{ u \mathbin{:}^{1} \mathsf{\textcolor{#bf616a}{U}} \vdash u \mathbin{:}^{2} \Pi X \mathbin{:} \star \mathpunct{.} ((X \to \star) \to \star) \to X) \to ((X \to \star) \to \star) \quad \vdash \mathsf{\textcolor{#bf616a}{U}} \mathbin{:}^{1} \star }{ u \mathbin{:}^{1} \mathsf{\textcolor{#bf616a}{U}} \vdash u \; \mathsf{\textcolor{#bf616a}{U}} \mathbin{:}^{2} (((\mathsf{\textcolor{#bf616a}{U}} \to \star) \to \star) \to \mathsf{\textcolor{#bf616a}{U}}) \to ((\mathsf{\textcolor{#bf616a}{U}} \to \star) \to \star) }

The strict level constraint is still there, since the level of U\mathsf{\textcolor{#bf616a}{U}} remains strictly smaller than that of uu. But without the annotation, the allowed level of the domain can rise as high as possible, yet still within the constraint, via cumulativity.


The formalism of StraTT also supports a global context Δ\Delta which consists of a list of global definitions x:jAax \mathbin{:}^{j} A \coloneqq a, where xx is a constant of type AA and body aa at level jj. Separately from cumulativity, we have displacement, which is based on Conor McBride’s notion of “crude-but-effective stratification”:4 global definitions are defined with fixed, constant levels, and then uniformly incremented as needed. This provides a minimalist degree of code reusability across levels without actually having to introduce any sort of level polymorphism. Formally, we have the following rules for displaced constants and their reduction behaviour that integrates both displacement and cumulativity.

x:jAaΔi+jkΔ;Γxi:kA+ix:jAaΔΔxia+i\frac{ x \mathbin{:}^{j} A \coloneqq a \in \Delta \quad i + j \leq k }{ \Delta; \Gamma \vdash x^{i} \mathbin{:}^{k} A^{+i} } \quad \frac{ x \mathbin{:}^{j} A \coloneqq a \in \Delta }{ \Delta \vdash x^{i} \rightsquigarrow a^{+i} }

The metafunction +i{\cdot}^{+i} recursively adds ii to all levels and displacements in a term; below are the two cases where the increment actually has an effect.

(Πx:jA.B)+iΠx:i+jA+i.B+i(xj)+ixi+j\begin{align*} (\Pi x \mathbin{:}^{j} A. B)^{+i} &≝ \Pi x \mathbin{:}^{i + j} A^{+i}. B^{+i} \\ (x^{j})^{+i} &≝ x^{i + j} \end{align*}

As an example of where displacement is useful, we can define a function that takes an argument uu of the displaced type U1\mathsf{\textcolor{#bf616a}{U}}^{1}, which now takes a type argument at level 11, so u  Uu \; \mathsf{\textcolor{#bf616a}{U}} is then well typed.

Floating (nondependent) functions

You’ll notice that I’ve been writing nondependent functions with arrows but without any level annotations. This is because in StraTT, there’s a separate syntactic construct for nondependent functions which behaves just like functions but no longer has the strict stratification restriction.

ΓA:kΓB:kΓAB:kΓ,x:kAb:kBΓλx.b:kABΓb:kABΓa:kAΓb  a:kB\frac{ \Gamma \vdash A \mathbin{:}^{k} \star \quad \Gamma \vdash B \mathbin{:}^{k} \star }{ \Gamma \vdash A \to B \mathbin{:}^{k} \star } \quad \frac{ \Gamma, x \mathbin{:}^{k} A \vdash b \mathbin{:}^{k} B }{ \Gamma \vdash \lambda x \mathpunct{.} b \mathbin{:}^{k} A \to B } \quad \frac{ \Gamma \vdash b \mathbin{:}^{k} A \to B \quad \Gamma \vdash a \mathbin{:}^{k} A }{ \Gamma \vdash b \; a \mathbin{:}^{k} B }

Removing the restriction is okay for nondependent function types, because there’s no risk of dependently instantiating a type variable in the type with the type itself.

These are called floating functions because rather than being fixed, the level of the domain of the function type “floats” with the level of overall function type (imagine a buoy bobbing along as the water rises and falls). Specifically, cumulativity lets us derive the following judgement.

f:jABf:jABjkf:jABf:kAB\frac{ f \mathbin{:}^{j} A \to B \vdash f \mathbin{:}^{j} A \to B \quad j \leq k }{ f \mathbin{:}^{j} A \to B \vdash f \mathbin{:}^{k} A \to B }

Unusually, if we start with a function ff at level jj that takes some AA at level jj, we can use ff at level kk as if it now takes some AA at a higher level kk. The floating function ff is covariant in the domain with respect to the levels! Typically functions are contravariant or invariant in their domains with respect to the ambient ordering on types, universes, levels, etc.5

But why floating functions?

Designing StraTT means not just designing for consistency but for expressivity too: fundamentally revamping how we think about universes is hardly useful if we can’t also express the same things we could in the first place.

First, here’s an example of some Agda code involving self-applications of identity functions in ways such that universe levels are forced to go up to in a nontrivial manner (i.e. not just applying bigger and bigger universes to something). Although I use universe polymorphism to define ID\mathsf{\textcolor{#bf616a}{ID}}, it isn’t strictly necessary, since all of its uses are inlineable without issue.

{-# OPTIONS --cumulativity #-}

open import Level

ID :    Set (suc )
ID  = (X : Set )  X  X

𝟘 = zero
𝟙 = suc zero
𝟚 = suc 𝟙
𝟛 = suc 𝟚

-- id id
idid1 : ID 𝟙  ID 𝟘
idid1 id = id (ID 𝟘) (λ x  id x)

-- id (λid. id id) id
idid2 : ID 𝟚  ID 𝟘
idid2 id = id (ID 𝟙  ID 𝟘) idid1 (λ x  id x)

-- id (λid. id (λid. id id) id) id
idid3 : ID 𝟛  ID 𝟘
idid3 id = id (ID 𝟚  ID 𝟘) idid2 (λ x  id x)

In idid1\mathsf{\textcolor{#bf616a}{idid1}}, to apply id\mathit{id} to itself, its type argument needs to be instantiated with the type of id\mathit{id}, meaning that the level of the id\mathit{id} on the left needs to be one larger, and the id\mathit{id} on the right needs to be eta-expanded to fit the resulting type with the smaller level. Repeatedly applying self-applications increases the level by one each time. I don’t think anything else I can say will be useful; this is one of those things where you have to stare at the code until it type checks in your brain (or just dump it into the type checker).

In the name of expressivity, we would like these same definitions to be typeable in StraTT. Suppose we didn’t have floating functions, so every function type needs to be dependent. This is the first definition that we hope to type check.

idid1 : 3 Π i d : 2 ( Π X : 1 . Π x : 1 X . X ) . ( Π X : 0 . Π x : 0 X . X ) λ i d . i d    ( Π X : 0 . Π x : 0 X . X )    ( λ X     x . i d    X    x )

The problem is that while id\mathit{id} expects its second argument to be at level 11, the actual second argument contains id\mathit{id} itself, which is at level 22, so such a self-application could never fit! And by stratification, the level of the argument has to be strictly smaller than the level of the overall function, so there’s no annotation we could fiddle about with to make id\mathit{id} fit in itself.

What floating functions grant us is the ability to do away with stratification when we don’t need it. The level of a nondependent argument to a function can be as large as the level of the function itself, which is exactly what we need to type check this first definition that now uses floating functions.

idid1:2(ΠX:1.XX)(ΠX:0.XX)λ  (ΠX:0.XX)  (λ  X)\begin{align*} \mathsf{\textcolor{#bf616a}{idid1}} &\mathbin{:}^{2} (\Pi X \mathbin{:}^{1} \star \mathpunct{.} X \to X) \to (\Pi X \mathbin{:}^{0} \star \mathpunct{.} X \to X) \\ &\coloneqq \lambda \mathit{id} \mathpunct{.} \mathit{id} \; (\Pi X \mathbin{:}^{0} \mathpunct{.} \star X \to X) \; (\lambda X \mathpunct{.} \mathit{id} \; X) \end{align*}

The corresponding definitions for idid2\mathsf{\textcolor{#bf616a}{idid2}} and idid3\mathsf{\textcolor{#bf616a}{idid3}} will successfully type check too; exercise for the reader.

When do functions float?

I’ve made it sound like all nondependent functions can be made to float, but unfortunately this isn’t always true. If the function argument itself is used in a nondependent function domain, then that argument is forced to be fixed, too. Consider for example the following predicate that states that the given type is a mere proposition, i.e. that all of its inhabitants are indistinguishable.

isProp:1ΠX:0.λX.Πx:0X.Πy:0X.ΠP:0X.P  xP  y\begin{align*} \mathsf{\textcolor{#bf616a}{isProp}} &\mathbin{:}^1 \Pi X \mathbin{:}^0 \star \mathpunct{.} \star \\ &\coloneqq \lambda X \mathpunct{.} \Pi x \mathbin{:}^0 X \mathpunct{.} \Pi y \mathbin{:}^0 X \mathpunct{.} \Pi P \mathbin{:}^0 X \to \star \mathpunct{.} P \; x \to P \; y \end{align*}

If isProp\mathsf{\textcolor{#bf616a}{isProp}} were instead assigned the floating function type \star \to \star at level 11, the type argument XX being at level 11 would force the level of PP to also be 11, which would force the overall definition to be at level 22, which would then make XX be at level 22, and so on.

Not only are not all nondependent functions necessarily floating, not all functions that could be floating necessarily need to be nondependent, either. I’m unsure of how to verify this, but I don’t see why function types like the identity function type (X:)XX(X \mathbin{:} \star) \to X \to X can’t float; I can’t imagine that being exploited to derive an inconsistency. The fixed/floating dichotomy appears to be independent of the dependent/nondependent dichotomy, and matching them up only an approximation.

Restriction lemma

Before I move on to the next design decision, I want to state and briefly discuss an important lemma used in our metatheory. We’ve proven type safety (that is, progress and preservation lemmas) for StraTT, and the proof relies on a restriction lemma.

Definition: Given a context Γ\Gamma and a level jj, the restriction Γj\lceil\Gamma\rceil_{j} discards all declarations in Γ\Gamma of level strictly greater than jj.

Lemma [Restriction]: If Γ\vdash \Gamma and Γa:jA\Gamma \vdash a \mathbin{:}^{j} A, then Γj\vdash \lceil\Gamma\rceil_{j} and Γja:jA\lceil\Gamma\rceil_{j} \vdash a \mathbin{:}^{j} A.

For the lemma to hold, no derivation of Γa:jA\Gamma \vdash a \mathbin{:}^{j} A can have premises whose level is strictly greater than jj; otherwise, restriction will discard those necessary pieces. This lemma is crucial in proving specifically the floating function case of preservation; if we didn’t have floating functions, the lemma isn’t necessary.

Relaxed levels

As it stands, StraTT enforces that the level of a term’s type is exactly the level of the term. In other words, the following regularity lemma is provable.

Lemma [Regularity]: If Γa:jA\Gamma \vdash a \mathbin{:}^{j} A, then ΓA:j\Gamma \vdash A \mathbin{:}^{j} \star.

But what if we relaxed this requirement?

Lemma [Regularity (relaxed)]: If Γa:jA\Gamma \vdash a \mathbin{:}^{j} A, then there is some kjk \geq j such that ΓA:k\Gamma \vdash A \mathbin{:}^{k} \star.

In such a new relaxed StraTT (RaTT), the restriction lemma is immediately violated, since ΓA:1,x:0A\Gamma ≝ A \mathbin{:}^{1} \star, x \mathbin{:}^{0} A is a well-formed context, but Γ0=x:0A\lceil\Gamma\rceil_{0} = x \mathbin{:}^{0} A isn’t even well scoped. So a system with relaxed levels that actually makes use of the relaxation can no longer accommodate floating functions because type safety won’t hold.

However, the dependent functions of RaTT are more expressive: the id\mathit{id} self-application can be typed using dependent functions alone!6 First, let’s pin down the new rules for our functions.

ΓA:kΓ,x:jAB:kj<kΓΠx:jA.B:kΓ,x:jAb:kBΓλx.b:kΠx:jA.BΓb:kΠx:jA.BΓa:jAΓb  a:kB[xa]\frac{ \Gamma \vdash A \mathbin{:}^{k} \star \quad \Gamma, x \mathbin{:}^{j} A \vdash B \mathbin{:}^{k} \star \quad j < k }{ \Gamma \vdash \Pi x \mathbin{:}^{j} A \mathpunct{.} B \mathbin{:}^{k} \star } \quad \frac{ \Gamma, x \mathbin{:}^{j} A \vdash b \mathbin{:}^{k} B }{ \Gamma \vdash \lambda x \mathpunct{.} b \mathbin{:}^{k} \Pi x \mathbin{:}^{j} A \mathpunct{.} B } \quad \frac{ \Gamma \vdash b \mathbin{:}^{k} \Pi x \mathbin{:}^{j} A \mathpunct{.} B \quad \Gamma \vdash a \mathbin{:}^{j} A }{ \Gamma \vdash b \; a \mathbin{:}^{k} B[x \mapsto a] }

Two things have changed:

  • The domain type AA can now be typed at the larger level kk rather than at the argument level jj; and
  • There’s no longer a restriction j<kj < k between the argument level jj and the function body level kk.

Since the premise j<kj < k still exists in the type formation rule, we can be assured that a term of type U\mathsf{\textcolor{#bf616a}{U}} still can’t be applied to U\mathsf{\textcolor{#bf616a}{U}} itself and that we still probably have consistency.7 Meanwhile, the absence of j<kj < k in the function introduction rule allows us to inhabit, for instance, the identity function type ΠX:1.Πx:0X.X\Pi X \mathbin{:}^{1} \star \mathpunct{.} \Pi x \mathbin{:}^{0} X \mathpunct{.} X, where the level annotations no longer match.

Expressivity and inexpressivity

These rules let us assign levels to the id\mathit{id} self-application as follows.

idid1:0Πid:0(ΠX:1.Πx:0X.X).(ΠX:0.Πx:0X.X)λ  (ΠX:0.Πx:0X.X)  (λX  X  x)\begin{align*} \mathsf{\textcolor{#bf616a}{idid1}} &\mathbin{:}^{0} \Pi \mathit{id} \mathbin{:}^{0} (\Pi X \mathbin{:}^{1} \star \mathpunct{.} \Pi x \mathbin{:}^{0} X \mathpunct{.} X) \mathpunct{.} (\Pi X \mathbin{:}^{0} \star \mathpunct{.} \Pi x \mathbin{:}^{0} X \mathpunct{.} X) \\ &\coloneqq \lambda \mathit{id} \mathpunct{.} \mathit{id} \; (\Pi X \mathbin{:}^{0} \star \mathpunct{.} \Pi x \mathbin{:}^{0} X \mathpunct{.} X) \; (\lambda X \; x \mathpunct{.} \mathit{id} \; X \; x) \end{align*}

Although the type of id\mathit{id} must live at level 22, the id\mathit{id} term can be assigned a lower level 00. This permits the self-application to go through, since the second argument of id\mathit{id} demands a term at level 00. Unusually, despite the second id\mathit{id} being applied to a type at level 11, the overall level of the second argument is still 00 because quantification over arguments at higher levels is permitted.

Despite the apparently added expressivity of these new dependent functions, unfortunately they’re still not as expressive as actually having floating functions. This was something I discovered while trying to type check Hurkens’ paradox and found that fewer definitions were typeable. I’ve attempted to isolate the issue a bit by revealing a similar problem when working with CPS types. Let’s first look at some definitions in the original StraTT with floating functions.

CPS:1λA.ΠX:0.(AX)Xreturn:1ΠA:0.ACPS  AλA  a  X  f.f  arun:1ΠA:0.CPS  AAλA  f.f  A  (λa.a)idCPS:2ΠA:0.CPS1  ACPS  AλA  f.f  (CPS  A)  (return  A)runReturn:2ΠA:0.CPS1  AλA  A  (idCPS  A  f)=run1  A  f\begin{array}{r l l} \mathsf{\textcolor{#bf616a}{CPS}} &\mathbin{:}^{1} \star \to \star &\coloneqq \lambda A \mathpunct{.} \Pi X \mathbin{:}^{0} \star. (A \to X) \to X \\ \mathsf{\textcolor{#bf616a}{return}} &\mathbin{:}^{1} \Pi A \mathbin{:}^{0} \star \mathpunct{.} A \to \mathsf{\textcolor{#bf616a}{CPS}} \; A &\coloneqq \lambda A \; a \; X \; f. f \; a \\ \mathsf{\textcolor{#bf616a}{run}} &\mathbin{:}^{1} \Pi A \mathbin{:}^{0} \star \mathpunct{.} \mathsf{\textcolor{#bf616a}{CPS}} \; A \to A &\coloneqq \lambda A \; f \mathpunct{.} f \; A \; (\lambda a \mathpunct{.} a) \\ \mathsf{\textcolor{#bf616a}{idCPS}} &\mathbin{:}^{2} \Pi A \mathbin{:}^{0} \star \mathpunct{.} \mathsf{\textcolor{#bf616a}{CPS}}^{1} \; A \to \mathsf{\textcolor{#bf616a}{CPS}} \; A &\coloneqq \lambda A \; f. f \; (\mathsf{\textcolor{#bf616a}{CPS}} \; A) \; (\mathsf{\textcolor{#bf616a}{return}} \; A) \\ \mathsf{\textcolor{#bf616a}{runReturn}} &\mathbin{:}^{2} \Pi A \mathbin{:}^{0} \star \mathpunct{.} \mathsf{\textcolor{#bf616a}{CPS}}^{1} \; A \to \star &\coloneqq \lambda A \; f \mathpunct{.} \mathsf{\textcolor{#bf616a}{run}} \; A \; (\mathsf{\textcolor{#bf616a}{idCPS}} \; A \; f) = \mathsf{\textcolor{#bf616a}{run}}^{1} \; A \; f \end{array}

CPS\mathsf{\textcolor{#bf616a}{CPS}} translates a type to its answer-polymorphic CPS form, and return\mathsf{\textcolor{#bf616a}{return}} translates a term into CPS. run\mathsf{\textcolor{#bf616a}{run}} does the opposite and runs the computation with the identity continuation to yield a term of the original type. runReturn\mathsf{\textcolor{#bf616a}{runReturn}} is a proposition we might want to prove about a given type and a computation of that type in CPS: passing return\mathsf{\textcolor{#bf616a}{return}} as the continuation for our computation (as in idCPS\mathsf{\textcolor{#bf616a}{idCPS}}) to get another computation should, when run, yield the exact same result as running the computation directly.

By careful inspection,8 everything type checks. Displacements needed are in idCPS\mathsf{\textcolor{#bf616a}{idCPS}} and runReturn\mathsf{\textcolor{#bf616a}{runReturn}} on the type of the computation ff. This displacement raises the level of the answer type argument of ff so that the type can be instantiated with CPS  A\mathsf{\textcolor{#bf616a}{CPS}} \; A. Consequently, we also need to displace the run1\mathsf{\textcolor{#bf616a}{run}}^{1} that runs the displaced computation on the right-hand side of runReturn\mathsf{\textcolor{#bf616a}{runReturn}}. Meanwhile, left-hand run\mathsf{\textcolor{#bf616a}{run}} shouldn’t be displaced because it runs the undisplaced computation returned by idCPS\mathsf{\textcolor{#bf616a}{idCPS}}.

Now let’s try to do the same in the new RaTT without floating functions. I’ll only write down the definition of CPS\mathsf{\textcolor{#bf616a}{CPS}} and idCPS\mathsf{\textcolor{#bf616a}{idCPS}}, and inline return\mathsf{\textcolor{#bf616a}{return}} in the latter.

CPS : 1 Π A : 0 . λ A . Π X : 0 . Π f : 0 ( Π a : 0 A . X ) . X idCPS : 0 Π A : 0 . Π f : 0 CPS 1    A . CPS    A λ A    f . f    ( CPS    A )    ( λ a    X    g . g    a )

This is a little more difficult to decipher, so let’s look at what the types of the different pieces of idCPS\mathsf{\textcolor{#bf616a}{idCPS}} are.

  • ff has type ΠX:1.Πf:1(Πa:1A.X).X\Pi X \mathbin{:}^{1} \star \mathpunct{.} \Pi f \mathbin{:}^{1} (\Pi a \mathbin{:}^{1} A \mathpunct{.} X) \mathpunct{.} X from displacing CPS\mathsf{\textcolor{#bf616a}{CPS}}.
  • The second argument of ff must have type Πa:1A.CPS  A\Pi a \mathbin{:}^{1} A \mathpunct{.} \mathsf{\textcolor{#bf616a}{CPS}} \; A from instantiating XX with CPS  A\mathsf{\textcolor{#bf616a}{CPS}} \; A. This is also the type of return\mathsf{\textcolor{#bf616a}{return}}.
  • The types of the arguments aa, XX, and gg of the second argument λa  X  g.g  a\lambda a \; X \; g \mathpunct{.} g \; a are a:1Aa \mathbin{:}^{1} A, X:0X \mathbin{:}^{0} \star, and g:0Πa:0A.Xg \mathbin{:}^{0} \Pi a \mathbin{:}^{0} A \mathpunct{.} X from expanding CPS  A\mathsf{\textcolor{#bf616a}{CPS}} \; A.
  • g  ag \; a is ill typed because aa has level 11 while gg takes an AA at level 00.

To fix this, we might try to increase the level annotation on aa in the definition of CPS\mathsf{\textcolor{#bf616a}{CPS}} so that gg can accommodate aa. But doing so would also increase the level annotation on aa in the displaced type of ff to 22, meaning that aa is still too large to fit into gg. The uniformity of the displacement mechanism means that if everything is annotated with a level, then everything moves up at the same time. Previously, floating functions allowed us to essentially ignore increasing levels due to displacement, but also independently increase them via cumulativity.

Of course, we could simply define a second CPS\mathsf{\textcolor{#bf616a}{CPS}} definition with a different level, i.e. ΠX:j.Πf:0(Πa:0A.X).X\Pi X \mathbin{:}^{j} \star \mathpunct{.} \Pi f \mathbin{:}^{0} (\Pi a \mathbin{:}^{0} A \mathpunct{.} X) \mathpunct{.} X for the level jj that we need, but then to avoid code duplication, we’re back at needing some form of level polymorphism over jj.

Level polymorphism

So far, I’ve treated level polymorphism as if it were something unpleasant to deal with and difficult to handle, and this is because level polymorphism is unpleasant and difficult. On the useability side, I’ve found that level polymorphism in Agda muddies the intent of the code I want to write and produces incomprehensible type errors while I’m writing it, and I hear that universe polymorphism in Coq is a similar beast. Of course, StraTT is still far away from being a useable tool, but in the absence of more complex level polymorphism, we can plan and design for more friendly elaboration-side features such as level inference.

On the technical side, it’s unclear how you might assign a type and level to something like k.Πx:kA.B\forall k \mathpunct{.} \Pi x \mathbin{:}^{k} A \mathpunct{.} B, since xx essentially quantifies over derivations at all levels. We would also need bounded quantification from both ends for level instantiations to be valid. For instance, Πx:k(Πy:2A.B  y).C  x\Pi x \mathbin{:}^{k} (\Pi y \mathbin{:}^{2} A \mathpunct{.} B \; y) \mathpunct{.} C \; x requires that the level kk of the type of the domain be strictly greater than 22, so the quantification is k>2\forall k > 2; and flipping the levels, Πx:2(Πy:kA.B  y).C  x\Pi x \mathbin{:}^{2} (\Pi y \mathbin{:}^{k} A \mathpunct{.} B \; y) \mathpunct{.} C \; x would require k<2\forall k < 2. While assigning a level to a quantification bounded above is easy (the level of the second type above can be 33), assigning a level to a quantification bounded below is as unclear as assigning one to an unbounded quantification.

At this point, we could either start using ordinals in general for our levels and always require upper-bounded quantification whose upper bound could be a limit ordinal, or we can restrict level quantification to prenex polymorphism at definitions, which is roughly how Coq’s universe polymorphism works, only implicitly, and to me the more reasonable option:

We believe there is limited need for […] higher-ranked universe polymorphism for a cumulative universe hierarchy.
Favonia et al. (2023)9

With the second option, we have to be careful not to repeat the same mistakes as Coq’s universe polymorphism. There, by default, every single (implicit) universe level in a definition is quantified over, and every use of a definition generates fresh level variables for each quantified level, so it’s very easy (albeit somewhat artificial) to end up with exponentially-many level variables to handle relative to the number of definitions. On the other hand, explicitly quantifying over level variables is tedious if you must instantiate them yourself, and it’s tricky to predict which ones you really do want to quantify over.

Level polymorphism is clearly more expressive, since in a definition of type i,j.Πx:iA.Πy:jB.C  x  y\forall i, j \mathpunct{.} \Pi x \mathbin{:}^{i} A \mathpunct{.} \Pi y \mathbin{:}^{j} B \mathpunct{.} C \; x \; y for instance, you can instantiate ii and jj independently, whereas displacement forces you to always displace both by the same amount. But so far, it’s unclear to me in what scenarios this feature would be absolutely necessary and not manageable with just floating functions and futzing about with the level annotations a little.

What’s in the paper?

The variant of StraTT that the paper covers is the one with stratified dependent functions, nondependent floating functions, and displacement. We’ve proven type safety, mechanized a model of StraTT without floating functions (but not the interpretation from the syntax to the semantics), and implemented StraTT extended with datatypes and level/displacement inference.

The paper doesn’t yet cover any discussion of RaTT with relaxed level constraints. I’m still deliberating whether it would be worthwhile to update my mechanized model first before writing about it, just to show that it is a variant that deserves serious consideration, even if I’ll likely discard it at the end. The paper doesn’t cover any discussion on level polymorphism either, and if I don’t have any more concrete results, I probably won’t go on to include it.

It doesn’t have to be for this current paper draft, but I’d like to have a bit more on level inference in terms of proving soundness and completeness. Soundness should be evident (we toss all of the constraints into an SMT solver), but completeness would probably require some ordering on different level annotations of the same term such that well-typedness of a “smaller” annotation set implies well-typedness of a “larger” annotation set, formalizing the inference algorithm, then showing that the algorithm always produces the “smallest” annotation set.


  • Typing derivations are stratified by levels
  • Dependent functions quantify over strictly lower types
  • Floating functions use cumulativity to raise domain levels
  • Displacement enables code reuse through uniformly incrementing levels
  • Allowing type levels to be larger than term levels breaks type safety for floating functions
  • Dependent functions and displacement alone are less expressive than also having floating functions
  • Level polymorphism is still an open question that I’d rather not include

  1. A draft of this post was first published on Cohost. 

  2. Daniel Leivant. Stratified polymorphism. LICS 1989. doi:10.1109/LICS.1989.39157

  3. For the semantically-inclined, it is possible to construct a model of these universes, as I’ve done in this Agda model, which also sketches out how the rest of consistency might be proven. 

  4. Conor McBride. Crude but Effective Stratification. 2011.

  5. This domain covariance prevents us from extending the above model with floating function types, because this behaviour semantically… well, kind of doesn’t make sense. So consistency with floating functions is an open problem. 

  6. Thanks to Yiyun Liu for pointing this out about the system. 

  7. The Agda model could probably be tweaked to accommodate this new system; I just haven’t done it. 

  8. Or by using the implementation to type check for me, which is what I did. 

  9. Favonia, Carlo Angiuli, Reed Mullanix. An Order-Theoretic Analysis of Universe Polymorphism. POPL 2023. doi:10.1145/3571250