In contrast, the story for impredicative type theory is not so clear. Incorporating different features alongside an impredicative Prop may require substantially different proof methods. This post catalogues these various models, what type theories they model, and what proof technique is used. Most proofs fall into one of three techniques: proof-irrelevant set-theoretic models, reducibility methods, and realizabililty semantics.
Work | Theory | Proof method | Universes | Inductives | Equality |
---|---|---|---|---|---|
Coquand (1985) | CC | ? | Prop, Type | none | untyped |
Pottinger (1987) | CC | ? | Prop, Type | none | untyped |
Ehrhard (1988) | CC | ω-Set | Prop, Type | none | none |
Coquand and Gallier (1990) | CC | reducibility | Prop, Type | none | untyped |
Luo (1990) | ECC | reducibility; ω-Set | Prop ⊆ Type{i} | dependent pairs | untyped |
Terlouw (1993) | CC | reducibility | Prop, Type | none | untyped |
Altenkirch (1993) | CC | Λ-Set | Prop, (Type) | impredicative | typed |
Goguen (1994) | UTT | set-theoretic | Prop, Type | predicative | typed |
Geuvers (1995) | CC | reducibility | Prop, Type | none | untyped |
Melliès and Werner (1998) | PTS | Λ-Set | Prop ⊈ Type{i} | none | untyped |
Miquel (2001) | CCω | set-theoretic; ω-Set | Prop ⊆ Type{i} | none | untyped |
Miquel (2001) | ICC | ? | Prop ⊆ Type{i} | none | untyped |
Miquel and Werner (2003) | CC | set-theoretic | Prop, Type | none | untyped |
Lee and Werner (2011) | pCIC | set-theoretic | Prop ⊆ Type{i} | predicative | typed |
Sacchini (2011) | CIC^- | Λ-Set | Prop, Type{i} | predicative, sized | untyped |
Barras (2012) | CCω | set-theoretic | Prop ⊆ Type{i} | naturals | untyped |
Barras (2012) | CC | Λ-Set | Prop, Type | naturals | untyped |
Timany and Sozeau (2018) | pCuIC | set-theoretic | Prop ⊆ Type{i} | predicative | typed |
An early set-theoretic model for an impredicative type theory is given by Goguen (1994) for UTT, a type theory with an impredicative Prop type and a predicative Type universe, typed equality, and predicative inductives with large elimination. It must be emphasized that Prop is merely a type and not a universe: predicates (that is, impredicative quantifications into Prop) have introduction and elimination forms that are distinct from ordinary functions and applications, and there’s an explicit inclusion from Prop into Type. The model is proof-irrelevant, meaning that Prop is interpreted as a two-element set, and its inhabitants (propositions) are interpreted as one of these two elements (true or false). It appears that the syntactic distinction between predicates and functions is what makes the proof-irrelevant model unproblematic.
Miquel (2001) describes a proof-irrelevant set-theoretic model for the Calculus of Constructions (CC) with untyped equality, an impredicative Prop, and a predicative hierarchy of type universes, known as CCω. The hierarchy is fully cumulative, so there is an inclusion of Prop in Type₀ and of each type universe in the next one up. Predicates are ordinary dependent functions, so to accommodate both functions into Prop and Type, he appeals to a trace encoding by Peter Aczel. This thesis, however, doesn’t go into the details of proving soundness of typing with respect to this model.
Miquel and Werner (2003) then go into the details of the proof for the original CC, which only has an impredicative Prop universe, a predicative Type universe, and no inclusion. They find that they require variables to be syntactically annotated with the sort of their type. Interestingly, they settle for a simpler encoding of functions rather than the trace encoding.
The primary issue with Miquel and Werner’s system is the lack of an inclusion of Prop in Type₀, featured in Rocq. Lee and Werner (2011) resolve this exactly by using the trace encoding together with a typed equality judgement rather than an untyped one to eliminate the need for sort-annotated variables. The system they model includes a fully cumulative predicative universe hierarchy atop an impredicative Prop, predicative inductives, guarded fixpoints, and large elimination, and is referred to in the literature as pCIC.
Timany and Sozeau (2018) then augment this system to pCuIC with cumulative predicative inductives, which are featured in Rocq, and adapt the same proof-irrelevant set-theoretic model with trace encodings, albeit using eliminators rather than fixpoints.
Barras (2012) in his habilitation thesis mechanizes in Rocq a proof-irrelevant set-theoretic model, with the same trace encoding, of a system with untyped equality, impredicative Prop and a fully cumulative predicative universe hierarchy, and naturals with an induction principle and large elimination. Inductive types are also discussed in a later chapter.
One key feature all of these type theories are missing that’s common in proof assistants with an impredicative Prop is inductive propositions and predicates: inductive definitions that live in Prop.
Coquand and Gallier (1990) use a Kripke logical relation to define their notion of reducibility as sets of terms to prove strong normalization for the Calculus of Constructions. In Section 7, they list other proofs of SN for CC, notably the PhD thesis of Coquand (1985) which apparently contains an error, and a private manuscript circulated by Coquand in 1987 correcting this error. They also cite Pottinger (1987) for a published proof of strong normalization for the Calculus of Constructions.
Geuvers (1995) again proves strong normalization for the Calculus of Constructions by a reducibility method, but instead using saturated sets of terms, which he claims to be a more flexible and extensible method. In the introduction, many other proofs of SN for CC are cited, including one by Terlouw (1993) which also uses saturated sets and is inspired from Coquand’s proof:
The specific proof presented [by Coquand] is also in some sense of a model theoretical nature (it refers to Kripke-like interpretations), but as to its elaboration it differs considerably from the proof hereafter: it does not assign a key role to a formal theory T (of the kind as explained in the general introduction) and in the present situation the notion of “interpretation” is simpler and the explicit role of contexts has been strongly reduced.
The Extended Calculus of Constructions (ECC) by Luo (1990) extends CC with dependent pairs and a fully cumulative universe hierarchy, and strong normalization is proven by reducibility via quasi-normalization to define a complexity measure for types. However, it’s unclear to me whether any sort of large elimination would be compatible. It certainly seems that going through quasi-normalization isn’t very popular, as I haven’t seen any subsequent work follow this technique.
Ehrhard (1988) gives an ω-Set model of the Calculus of Constructions, and Luo (1990) in the same thesis sketches out an ω-Set model for ECC, but notes that:
We shall not give a model semantics in full detail. There is a known problem about defining a model semantics of rich type theories like the calculus of constructions; that is, since there may be more than one derivation of a derivable judgement, a direct inductive definition by induction on derivations is questionable.
Altenkirch (1993) modifies the ω-Set model and introduces a Λ-Set model for proving strong normalization. A proof is worked out for a Calculus of Constructions with typed equality and impredicative inductives with large elimination. The system is presented à la Tarski rather than à la Russell.
Melliès and Werner (1998) then generalize this technique to all PTSes whose sorts satisfy a certain property. To demonstrate, they apply the method to prove strong normalization of ECC (without dependent pairs). However, it appears that it does not cover inclusion of Prop in Type. Sacchini (2011), who uses a Λ-Set model based on Melliès and Werner’s to show strong normalization for a variant of CIC with sized types (CIC^-), notes in Section 4.3.1 that:
It is important to separate proofs from computation. Therefore, it is not trivial to extend this model in the presence of a subtyping rule of the form Prop ≤ Type₀. This is the reason we do not consider this rule in our proof (cf. Sect. 5.1).
and in Section 5.1 notes that:
We cannot easily adapt our Λ-set model to handle rule [Prop ≤ Type₀]. In the presence of this rule, the separation property of proofs and computations is no longer valid.
Miquel (2001) sketches out an ω-Set model for CCω, but doesn’t go into proving soundness of typing with respect to the model. The proof method that he does use for the Implicit Calculus of Constructions (ICC) is a simplified version of the Λ-set model using coherent 𝒦-spaces. ICC has the distinction of being the only language cited here that is Curry-style (unannotated lambdas) rather than Church-style (domain-annotated lambdas). It’s unclear to me how he handles a cumulative impredicative Prop.
Barras (2012) mechanizes in Rocq a Λ-set model for a Calculus of Constructions with naturals and large elimination. It appears there are no larger universes, nor cumulativity.
[Altenkirch 1993] Thorsten Altenkirch. Constructions, Inductive Types, and Strong Normalization. (PhD thesis). ᴜʀʟ: https://www.cs.nott.ac.uk/~psztxa/publ/phd93.pdf.
[Barras (2012)] Bruno Barras. Semantical Investigations in Intuitionistic Set Theory and Type Theories with Inductive Families. (Habilitation thesis). ᴜʀʟ: http://www.lsv.fr/~barras/habilitation/barras-habilitation.pdf.
[Casinghino (2010)] Chris Casinghino. Strong Normalization for the Calculus of Constructions. ᴅᴏɪ: 10.48550/arXiv.2210.11240.
[Coquand (1985)] Thierry Coquand. Une Théorie Des Constructions. (PhD thesis).
[Coquand and Gallier (1990)] Thierry Coquand, Jean Gallier. A Proof Of Strong Normalization For The Theory Of Constructions Using A Kripke-Like Interpretation. (1990). ᴜʀʟ: https://repository.upenn.edu/handle/20.500.14332/7509.
[Ehrhard (1988)] Thomas Ehrhard. A categorical semantics of constructions. (LICS 1988). ᴅᴏɪ: 10.1109/LICS.1988.5125.
[Geuvers and Nederhof (1991)] Herman Geuvers and Mark-Jan Nederhof. Modular proof of strong normalization for the calculus of constructions. (JFP 1991). ᴅᴏɪ: 10.1017/S0956796800020037.
[Geuvers (1995)] Herman Geuvers. A short and flexible proof of strong normalization for the calculus of constructions. (TYPES 1994). ᴅᴏɪ: 10.1007/3-540-60579-7_2.
[Goguen (1994)] Healfdene Goguen. A Typed Operational Semantics for Type Theory. (PhD thesis). ᴜʀʟ: http://hdl.handle.net/1842/405.
[Lee and Werner (2011)] Gyesik Lee, Benjamin Werner. Proof-Irrelevant Model of CC with Predicative Induction and Judgemental Equality. (LMCS 7(4)). ᴅᴏɪ: 10.2168/LMCS-7(4:5)2011.
[Luo (1990)] Zhaohui Luo. An Extended Calculus of Constructions. (PhD thesis). ᴜʀʟ: https://www.lfcs.inf.ed.ac.uk/reports/90/ECS-LFCS-90-118/.
[Melliès and Werner (1998)] Paul-André Melliès and Benjamin Werner. A Generic Normalisation Proof for Pure Type Systems. (RR-3548, INRIA). ᴜʀʟ: https://inria.hal.science/inria-00073135.
[Miquel (2001)] Alexandre Miquel. Le calcul des constructions implicite: syntaxe et sémantique. (PhD thesis). ᴜʀʟ: https://www.fing.edu.uy/~amiquel/publis/these.pdf.
[Miquel and Werner (2003)] Alexandre Miquel, Benjamin Werner. The Not So Simple Proof-Irrelevant Model of CC. (TYPES 2002). ᴅᴏɪ: 10.1007/3-540-39185-1_14.
[Pottinger (1987)] Garrel Pottinger. Strong Normalization for Terms of the Theory of Constructions. (TR 11-7, Odyssey Research Associates). ᴅᴏɪ: 10.5281/zenodo.11455038.
[Sacchini (2011)] Jorge Luis Sacchini. On Type-Based Termination and Dependent Pattern Matching in the Calculus of Inductive Constructions. (PhD thesis). ᴜʀʟ: https://pastel.hal.science/pastel-00622429.
[Terlouw (1993)] Jan Terlouw. Strong normalization in type systems: A model theoretical approach. (Annals of Pure and Applied Logic 73(1)). ᴅᴏɪ: 10.1016/0168-0072(94)00040-A.
[Timany and Sozeau (2018)] Amin Timany, Mattieu Sozeau. Consistency of the Predicative Calculus of Cumulative Constructions. (FSCD 2018). ᴅᴏɪ: 10.4230/LIPIcs.FSCD.2018.29.
These aren’t in any particular order, and I’m reluctant to do any sort of ranking, since I think they’re all pretty great, and I haven’t had enough coffee from each of them to be able to assess them against one another.
Location: 11th and Spruce
Hours: until 18:00
They usually have a single-origin on batch brew until they run out, and they do run out. Back in autumn 2023 they also had a spot at the Tuesday Rittenhouse farmers’ markets where they offered batch brew and pourovers (!), but they don’t seem to do that anymore. I’m biased towards them because their farmers’ market pourovers was how I started getting into coffee :)
Locations: 47th and Pine | Susquehanna and Norris (Fishtown) | others
Hours: until 15:00
This might be a seasonal thing, but I recently got a single-origin brew at the Fishtown location. I haven’t noticed if the West Philly location ever did, but I have gotten a pourover there. Interestingly, I’ve also seen their beans at Revolver in Vancouver!
Location: 3rd and South
Hours: until 13:00 (weekdays)/14:00 (weekends)
They have a single-origin on batch brew until they run out, but I’ve always been able to get a cup when I’ve gone, although they do close pretty early, so you might miss them if you’re an afternoon coffee person like me. I don’t think they offer pourovers though. A few other coffee shops in Philly carry and brew their beans, such as K’Far.
Location: Rittenhouse Square and Locust
Hours: until 16:00
They only have batch-brewed single-origin on the weekends, and they get pretty busy during the weekends! Understandably, they don’t do pourovers, and their physical location is a pretty tiny place.
Location: 24th and Lombard | Passyunk and Tasker
Hours: until 18:00
They usually have around three selections for pourovers, and the best part is the happy hour 14:00 – 16:00 where they’re a dollar off.
Location: 21st and Spring (behind the Franklin Institute)
Hours: until 17:00
They pretty much have the same four roasts all year round, and I believe typically offer two or three of them as pourovers. (They also have a really good spicy chai latte made in-house.)
Locations: 37th and Market | b/w 15th & 16th and Walnut | others
Hours: varies by location
This is perhaps the most well-known specialty coffee roaster in Philly, and many very nice coffeeshops will carry and brew their beans.
Location: Main and Jamestown (Manayunk)
Hours: until 16:00
Relative to Centre City, they’re pretty far away in Manayunk, but it’s a great specialty coffeeshop if you’re in the area. I had a single-origin brew during the Manayunk StrEAT Food Festival, and I assume they also normally have it, along with pourovers.
Location: 20th and Locust | 22nd and Catharine | 15th and Mifflin
Hours: until 18:00
Location: 15th and Federal (near Ellsworth-Federal station)
Hours: until 14:00
Location: 32nd and Walnut
Hours: until 15:00 (weekdays)/16:00 (weekends)
Location: 20th and Chestnut
Hours: until 15:30 (weekdays)/13:00 (Saturday)
Their roasting setup is physically in this shop, and they offer an enormous selection of different beans and roasts. They also do have the specialty coffee by-the-cup, but I haven’t seen how exactly it is they brew it, because there’s no pourover setup and it’s an enormous 16 oz cup. Their light roasts have always tasted kind of burnt to me… I feel bad for saying it because the two people running the shop are really nice and I really want to like their coffee!
Location: Frankford and Girard
Hours: until 15:00 (weekdays)/16:00 (weekends)
I believe they only ever have one roast of beans at a time, so by default their batch brew would be that coffee. A bit pricey, but the specialty drinks are really tasty.
Location: 12th and Sansom
Hours: permanently closed :(
This is apparently a Croatian coffee roaster with five locations in Croatia, one in Dubai (??), and formerly one in Philadelphia (??). I unfortunately never got to get a pourover from them before they closed…
]]>I won’t go too much into detail on the definition of ECIC, but I’ll touch on some key rules and examples. In short, ECIC is CIC, but:
They claim that erasure of large arguments is what permits unrestricted inductive elimination, but I believe that this is still inconsistent.
To establish some syntax, here’s the typing rule for function types and functions with erased arguments, where $|\cdot|$ is the erasure of a term, which erases type annotations as well as erased arguments entirely.
$\frac{ \Gamma \vdash \tau_1 : s_1 \qquad \Gamma, x : \tau_1 \vdash \tau_2 : s_2 \qquad (k, s_1, s_2, s_3) \in \mathcal{R} }{ \Gamma \vdash (x :^k \tau_1) \to \tau_2 : s_3 } \qquad \frac{ \Gamma, x : \tau_1 \vdash e : \tau_2 \qquad x \notin \mathsf{fv}(| e |) }{ \Gamma \vdash \lambda x :^{\mathbf{\mathsf{e}}} \tau_1. e : (x :^{\mathbf{\mathsf{e}}} \tau_1) \to \tau_2 }$The PTS rules $\mathcal{R}$ are $(k, \mathsf{\textcolor{#069}{Prop}}, s, s)$, $(k, \mathsf{\textcolor{#069}{Type}}_{\ell_1}, \mathsf{\textcolor{#069}{Type}}_{\ell_2}, \mathsf{\textcolor{#069}{Type}}_{\ell_1 \sqcup \ell_2})$, and most importantly, $(\mathsf{e}, \mathsf{\textcolor{#069}{Type}}_\ell, \mathsf{\textcolor{#069}{Prop}}, \mathsf{\textcolor{#069}{Prop}})$. The last rule says that if you want to use something large for a proposition, it must be erasable, and therefore only used in type annotations. A key theorem states that if a term is typeable is CIC, then it’s typeable in ECIC with additional erasure annotations, meaning that CIC inherently never uses impredicativity in computationally relevant ways.
The typing rules for inductives and especially their case expressions, as usual, are rather involved, and I won’t repeat them here. Instead, to establish its syntax, here’s an example defining propositional equality as an inductive, and a proof that it’s symmetric. From here onwards I’ll use square brackets to indicate erased arguments, omitting annotations entirely, as well as $\cdot \to \cdot$ as a shorthand for nondependent function types.
$\begin{align*} \mathsf{\textcolor{#bf616a}{Eq}} &: [A : \mathsf{\textcolor{#069}{Type}}] \to [A] \to [A] \to \mathsf{\textcolor{#069}{Prop}} \\ &\coloneqq \lambda [A : \mathsf{\textcolor{#069}{Type}}]. \lambda [x : A]. \mathsf{\textcolor{#069}{Ind}}(\textit{Eq} : [A] \to \mathsf{\textcolor{#069}{Prop}})\langle 0: \textit{Eq} \; [x] \rangle \\ \mathsf{\textcolor{#bf616a}{Refl}} &: [A : \mathsf{\textcolor{#069}{Type}}] \to [x : A] \to \mathsf{\textcolor{#bf616a}{Eq}} \; [A] \; [x] \; [x] \\ &\coloneqq \lambda [A : \mathsf{\textcolor{#069}{Type}}]. \lambda [x : A]. \mathsf{\textcolor{#069}{Con}}(0, \mathsf{\textcolor{#bf616a}{Eq}} \; [A] \; [x]) \\ \mathsf{\textcolor{#bf616a}{sym}} &: [A : \mathsf{\textcolor{#069}{Type}}] \to [x : A] \to [y : A] \to (p : \mathsf{\textcolor{#bf616a}{Eq}} \; [A] \; [x] \; [y]) \to \mathsf{\textcolor{#bf616a}{Eq}} \; [A] \; [y] \; [x] \\ &\coloneqq \lambda [A : \mathsf{\textcolor{#069}{Type}}]. \lambda [x : A]. \lambda [y : A]. \lambda p : \mathsf{\textcolor{#bf616a}{Eq}} \; [A] \; [x] \; [y]. \\ &\phantom{\coloneqq} \mathsf{\textcolor{#069}{Case}} \; p \; \mathsf{\textcolor{#069}{return}} \; \lambda [z : A]. \lambda p : \mathsf{\textcolor{#bf616a}{Eq}} \; [A] \; [x] \; [z]. \mathsf{\textcolor{#bf616a}{Eq}} \; [A] \; [z] \; [x] \; \mathsf{\textcolor{#069}{of}} \; \langle 0: \mathsf{\textcolor{#bf616a}{Refl}} \; [A] \; [x] \rangle \end{align*}$This equality is over large terms to be able to talk about both equal propositions and equal proofs^{7}, so basically all arguments except for equality itself must be erased. However, it still doesn’t qualify as a large inductive, since none of the constructor arguments are large, i.g. have types in sort $\mathsf{\textcolor{#069}{Type}}$. Contrariwise, the following inductive is large, requiring its first constructor argument to be erased to justify unrestricted elimination.
$\mathsf{\textcolor{#bf616a}{U}} : \mathsf{\textcolor{#069}{Prop}} \coloneqq \mathsf{\textcolor{#069}{Ind}}(\textit{U} : \mathsf{\textcolor{#069}{Prop}})\langle [X : \mathsf{\textcolor{#069}{Prop}}] \to (X \to \textit{U}) \to \textit{U} \rangle$ECIC’s $\mathsf{\textcolor{#069}{Prop}}$ is neither like Rocq’s $\mathsf{\textcolor{#069}{Prop}}$ nor like its impredicative $\mathsf{\textcolor{#069}{Set}}$, which I’ll call $\mathsf{\textcolor{#069}{rProp}}$ and $\mathsf{\textcolor{#069}{rSet}}$ to avoid confusion. Large inductives in either $\mathsf{\textcolor{#069}{rProp}}$ and $\mathsf{\textcolor{#069}{rSet}}$ are not allowed to be strongly eliminated, because this would be inconsistent. Furthermore, as members of types of sort $\mathsf{\textcolor{#069}{rProp}}$ are intended to be erased during extraction, inductives in $\mathsf{\textcolor{#069}{rProp}}$ with multiple constructors can’t be strongly eliminated either, so that case expressions only have a single branch to which they erase. This makes eCIC’s $\mathsf{\textcolor{#069}{Prop}}$, confusingly, exactly Rocq’s impredicative $\mathsf{\textcolor{#069}{Set}}$.
Coquand’s paradox of trees develops an inconsistency out of the inductive $\mathsf{\textcolor{#bf616a}{U}}$ above by showing that simultaneously all $\mathsf{\textcolor{#bf616a}{U}}$ are well founded and there exists a $\mathsf{\textcolor{#bf616a}{U}}$ that is not well founded, in particular $\mathsf{\textcolor{#bf616a}{loop}}$ below.
$\begin{align*} \mathsf{\textcolor{#bf616a}{loop}} : \mathsf{\textcolor{#bf616a}{U}} \coloneqq \mathsf{\textcolor{#069}{Con}}(0, \mathsf{\textcolor{#bf616a}{U}}) \; [U] \; (\lambda x : U. x) \end{align*}$Before we continue, we need an injectivity lemma saying that if two $\mathsf{\textcolor{#bf616a}{U}}$s are equal, then their components are equal too. I omit the $\mathsf{\textcolor{#069}{return}}$ expression in the body because it’s already in the type. Importantly, when matching on the $\mathsf{\textcolor{#bf616a}{U}}$s, the large, erased first argument of the constructor is only used in erased positions.
$\begin{align*} \mathsf{\textcolor{#bf616a}{injU}} &: (u_1 : \mathsf{\textcolor{#bf616a}{U}}) \to (u_2 : \mathsf{\textcolor{#bf616a}{U}}) \to (p : \mathsf{\textcolor{#bf616a}{Eq}} \; [\mathsf{\textcolor{#bf616a}{U}}] \; [u_1] \; [u_2]) \to \\ &\phantom{:} \mathsf{\textcolor{#069}{Case}} \; u_1 \; \mathsf{\textcolor{#069}{return}} \; \lambda u : \mathsf{\textcolor{#bf616a}{U}}. \mathsf{\textcolor{#069}{Prop}} \; \mathsf{\textcolor{#069}{of}} \; \langle \lambda [X_1 : \mathsf{\textcolor{#069}{Prop}}]. \lambda f_1 : X_1 \to \mathsf{\textcolor{#bf616a}{U}}. \\ &\phantom{:} \mathsf{\textcolor{#069}{Case}} \; u_2 \; \mathsf{\textcolor{#069}{return}} \; \lambda u : \mathsf{\textcolor{#bf616a}{U}}. \mathsf{\textcolor{#069}{Prop}} \; \mathsf{\textcolor{#069}{of}} \; \langle \lambda [X_2 : \mathsf{\textcolor{#069}{Prop}}]. \lambda f_2 : X_2 \to \mathsf{\textcolor{#bf616a}{U}}. \\ &\phantom{:} \exists q : \mathsf{\textcolor{#bf616a}{Eq}} \; [\mathsf{\textcolor{#069}{Prop}}] \; [X_1] \; [X_2]. \\ &\phantom{: \exists q :} \mathsf{\textcolor{#bf616a}{Eq}} \; [X_2 \to \mathsf{\textcolor{#bf616a}{U}}] \; \\ &\phantom{: \exists q : \mathsf{\textcolor{#bf616a}{Eq}}} \; [\mathsf{\textcolor{#069}{Case}} \; q \; \mathsf{\textcolor{#069}{return}} \; \lambda [Z : \mathsf{\textcolor{#069}{Prop}}]. \lambda q : \mathsf{\textcolor{#bf616a}{Eq}} \; [\mathsf{\textcolor{#069}{Prop}}] \; [X_1] \; [Z]. Z \to U \; \mathsf{\textcolor{#069}{of}} \; \langle f_1 \rangle] \; \\ &\phantom{: \exists q : \mathsf{\textcolor{#bf616a}{Eq}}} \; [f_2] \rangle \rangle \\ &\coloneqq \lambda u_1 : \mathsf{\textcolor{#bf616a}{U}}. \lambda u_2 : \mathsf{\textcolor{#bf616a}{U}}. \lambda p : \mathsf{\textcolor{#bf616a}{Eq}} \; [\mathsf{\textcolor{#bf616a}{U}}] \; [u_1] \; [u_2]. \\ &\phantom{\coloneqq} \mathsf{\textcolor{#069}{Case}} \; p \; \mathsf{\textcolor{#069}{of}} \; \langle \mathsf{\textcolor{#069}{Case}} \; u_1 \; \mathsf{\textcolor{#069}{of}} \; \langle \lambda [X : \mathsf{\textcolor{#069}{Prop}}]. \lambda f : X \to \mathsf{\textcolor{#bf616a}{U}}. (\mathsf{\textcolor{#bf616a}{Refl}} \; [\mathsf{\textcolor{#069}{Prop}}] \; [X] , \mathsf{\textcolor{#bf616a}{Refl}} \; [X \to \mathsf{\textcolor{#bf616a}{U}}] \; [f]) \rangle \rangle \end{align*}$The equalities make its statement a bit convoluted,
but the proof is ultimately a pair of reflexivities.
It would perhaps be clearer to see the equivalent Rocq proof,
which uses rew
notations to make it cleaner.
Here, I need to turn off universe checking when defining the inductive
to prevent Rocq from disallowing its strong elimination.
Notably, this is the only place where turning off universe checking is required,
and the type of $\mathsf{\textcolor{#bf616a}{injU}}$ is the only place where strong elimination occurs.
Import EqNotations.
Unset Universe Checking.
Inductive U : Prop := u (X : Prop) (f : X -> U) : U.
Set Universe Checking.
Definition loop : U := u U (fun x => x).
Definition injU [u1 u2 : U] (p : u1 = u2) :
match u1, u2 with
| u X1 f1, u X2 f2 =>
exists (q : X1 = X2),
rew [fun Z => Z -> U] q in f1 = f2
end :=
rew dependent p in
match u1 with
| u _ _ => ex_intro _ eq_refl eq_refl
end.
The wellfoundedness predicate is another inductive stating that a $\mathsf{\textcolor{#bf616a}{U}}$ is well founded if all of its children are. Wellfoundedness for all $\mathsf{\textcolor{#bf616a}{U}}$ is easily proven by induction.
$\begin{align*} \mathsf{\textcolor{#bf616a}{WF}} &: \mathsf{\textcolor{#bf616a}{U}} \to \mathsf{\textcolor{#069}{Prop}} \\ &\coloneqq \mathsf{\textcolor{#069}{Ind}}(\textit{WF} : \mathsf{\textcolor{#bf616a}{U}} \to \mathsf{\textcolor{#069}{Prop}}) \langle 0: [X : \mathsf{\textcolor{#069}{Prop}}] \to (f : X \to U) \to ((x : X) \to \textit{WF} \; (f \; x)) \to \textit{WF} \; (\mathsf{\textcolor{#069}{Con}}(0, \mathsf{\textcolor{#bf616a}{U}}) \; [X] \; f) \rangle \\ \mathsf{\textcolor{#bf616a}{wfU}} &: (u : \mathsf{\textcolor{#bf616a}{U}}) \to \mathsf{\textcolor{#bf616a}{WF}} \; u \\ &\coloneqq \mathsf{\textcolor{#069}{Fix}}_0 \; \textit{wfU} : (u : \mathsf{\textcolor{#bf616a}{U}}) \to \mathsf{\textcolor{#bf616a}{WF}} \; u \; \mathsf{\textcolor{#069}{in}} \; \lambda u : \mathsf{\textcolor{#bf616a}{U}}. \\ &\phantom{\coloneqq} \mathsf{\textcolor{#069}{Case}} \; u \; \mathsf{\textcolor{#069}{return}} \; \lambda u : \mathsf{\textcolor{#bf616a}{U}}. \mathsf{\textcolor{#bf616a}{WF}} \; u \; \mathsf{\textcolor{#069}{of}} \; \langle 0: \lambda [X : \mathsf{\textcolor{#069}{Prop}}]. \lambda f : X \to \mathsf{\textcolor{#bf616a}{U}}. \mathsf{\textcolor{#069}{Con}}(0, \mathsf{\textcolor{#bf616a}{WF}}) \; [X] \; f \; (\lambda x : X. \textit{wfU} \; (f \; x)) \rangle \end{align*}$Again, here’s the equivalent definitions in Rocq. There’s no need to turn off universe checking this time since we won’t be strongly eliminating $\mathsf{\textcolor{#bf616a}{WF}}$.
Inductive WF : U -> Prop :=
| wf : forall X f, (forall x, WF (f x)) -> WF (u X f).
Fixpoint wfU (u : U) : WF u :=
match u with
| u X f => wf X f (fun x => wfU (f x))
end.
Showing nonwellfoundedness of $\mathsf{\textcolor{#bf616a}{loop}}$ is more complicated, not because it’s an inherently difficult proof, but because it requires manually unifying indices. In fact, the whole proof in Agda is quite simple.
{-# NO_UNIVERSE_CHECK #-}
data U : Set where
u : (X : Set) → (X → U) → U
data WF : U → Set₁ where
wf : ∀ (X : Set) (f : X → U) → (∀ x → WF (f x)) → WF (u X f)
loop = u U (λ x → x)
nwf : WF loop → ⊥
nwf (wf X f h) = nwf (h loop)
Destructing $\mathsf{\textcolor{#bf616a}{WF}} \; \mathsf{\textcolor{#bf616a}{loop}}$ as $\mathsf{\textcolor{#bf616a}{WF}} \; [X] \; f \; h$, we know that $X$ is $\mathsf{\textcolor{#bf616a}{U}}$ and $f$ is $\lambda x : \mathsf{\textcolor{#bf616a}{U}}. x$. In Rocq, we have the help of tactics and dependent induction, as well as $\mathsf{\textcolor{#bf616a}{injU}}$ proven earlier to explicitly unify indices.
Require Import Coq.Program.Equality.
Lemma nwf (u : U) (p : u = loop) (wfl : WF u) : False.
Proof.
dependent induction wfl.
apply injU in p as [q r].
simpl in q. subst.
simpl in r. subst.
eapply H0. reflexivity.
Qed.
But writing the same proof in plain ECIC is a challenge,
especially as the proof term generated for nwf
is disgusting.
I’ve simplified it to the following to the best of my ability,
still using copious amounts of rew
.
Fixpoint nwf (wfl : WF loop) : False :=
match wfl in WF u' return loop = u' -> False with
| wf _ f h => fun p => let (q , r) := injU p in
(rew dependent [fun _ q => forall f,
rew [fun Z => Z -> U] q in (fun x => x) = f
-> (forall x, WF (f x))
-> False] q
in fun _ r =>
rew [fun f => (forall x, WF (f x)) -> False] r
in fun h => nwf (h loop)) f r h
end eq_refl.
I won’t bother trying to typeset that in ECIC, but I hope it’s convincing enough as an argument that the corresponding definition would still type check and that erasure isn’t violated, i.e. that the $\mathsf{\textcolor{#069}{Prop}}$ argument from the $\mathsf{\textcolor{#bf616a}{WF}}$ isn’t used in the body of the proof. This proof doesn’t even use strong elimination, either: the return type of every case expression lives in $\mathsf{\textcolor{#069}{Prop}}$.
The proof sketch above tells us how ECIC isn’t consistent, but we still need to understand why it isn’t consistent. ECIC’s original argument was that
forcing impredicative fields to be erasable also avoids this source of inconsistency usually avoided with the $\Gamma \vdash e \; \mathsf{small}$ constraint.
Said source refers to the ability for large impredicative inductives with strong elimination to hide a larger type within a smaller construction that’s then used later. The idea is that if the impredicative field is erased, then surely it can’t be meaningfully used later as usual to construct an inconsistency. Here, I’ve shown that it can be meaningfully used even if it can’t be used relevantly, because all we need is to be able to refer to them in propositions and proofs. It doesn’t really make sense anyway that the computational relevance of a term should have any influence on propositions, which arguably exist independently of whether they can be computed.
Then why is strong elimination inconsistent with impredicativity, if computational relevance isn’t the reason? I believe that the real connection is between impredicativity and proof irrelevance, from which computational irrelevance arises. After all, $\mathsf{\textcolor{#069}{Prop}}$ is often modelled proof-irrelevantly as a two-element set $\{ \top , \bot \}$, collapsing all of its types to truthhood or falsehood and disregarding those types’ inhabitants. Other times, $\mathsf{\textcolor{#069}{Prop}}$ is defined as the universe of mere propositions, or the universe of types such that their inhabitants (if any) are all propositionally equal.
Under this view, impredicativity is permitted, although not necessary, because referring to larger information ought to be safe so long as there’s no way to use that information to violate proof irrelevance. Strong elimination commits this violation because, as seen in $\mathsf{\textcolor{#bf616a}{injU}}$, it allows us to talk about the identity (or non-identity) of larger terms, even if the way we talk about it is proof-irrelevantly. Concretely, the contrapositive of $\mathsf{\textcolor{#bf616a}{injU}}$ lets us distinguish two $\mathsf{\textcolor{#bf616a}{U}}$s as soon as we have two provably unequal types, such as $\top$ and $\bot$, from which we can provably distinguish $\mathsf{\textcolor{#069}{Con}}(0, \mathsf{\textcolor{#bf616a}{U}}) \; [\top] \; (\lambda \_. \mathsf{\textcolor{#bf616a}{loop}})$ and $\mathsf{\textcolor{#069}{Con}}(0, \mathsf{\textcolor{#bf616a}{U}}) \; [\bot] \; (\lambda b. \mathsf{\textcolor{#069}{Case}} \; b \; \mathsf{\textcolor{#069}{of}} \; \langle \rangle)$.
Interestingly, even without strong elimination of large inductives, proof irrelevance can still be violated by strong elimination of a small inductive with two constructors, since that would enable proving the two constructors unequal. The only way an inconsistency arises is if proof irrelevance is explicitly internalized. This is why the axiom of excluded middle is inconsistent in the presence of strong elimination in this setting: Berardi’s paradox^{8} says that EM implies proof irrelevance. I think there’s something profound in this paradox that I haven’t yet grasped, because connects two different views of a proposition: as something that is definitively either true or false, or as something that carries no other information than whether it is true or false.
Not to be confused with their eCIC, the erasable CIC. ↩
Stefan Monnier; Nathaniel Bos. Is Impredicativity Implicitly Implicit? TYPES 2019. doi:10.4230/LIPIcs.TYPES.2019.9. ↩
Nathan Mishra-Linger; Tim Sheard. Erasure and polymorphism in pure type systems. FOSSACS 2008. doi:10.5555/1792803.1792828. ↩
Thierry Coquand. The paradox of trees in type theory. BIT 32 (1992). doi:10.1007/BF01995104. ↩
Also called run-time erasure, run-time irrelevance, computational irrelevance, or external erasure. ↩
Also called the no-SELIT (strong elimination of large inductive types) rule. ↩
ECIC doesn’t formally have cumulativity, but we can use our imaginations. ↩
Barbanera, Franco; Berardi, Stefano. JFP 1996. Proof-irrelevance out of excluded middle and choice in the calculus of constructions. doi:10.1017/S0956796800001829. ↩
While looking up the various colour films my local photo shops carry, I found that a lot of them are actually respools and repackagings of other film (most Kodak), so I’ve tried to compile that information here to keep track of them all. I’m personally not looking for “experimental” colour films or films that aren’t “true colour”, so I’ve excluded a number of those here, which are mostly:
Kodak’s Vision3 movie film, which comes gigantic spools, is a popular choice for respooling into canisters. The C41 respools have the remjet backing removed so they produce some amount of halation, while the ECN2 respools have not but most photo labs (at least the ones near me) don’t handle it.
Original film | C41 respools | ECN2 respools |
---|---|---|
Vision3 500T | CineStill 800T, RETO Amber T800, Reflx Lab 800 tungsten, Reformed Night 800, Candido 800, FilmNeverDie UMI 800 (?) | Flic Film Cine Colour 500T, Reflx Lab 500T, Silbersaltz35 500T, FilmNeverDie SUIBO 500T |
Vision3 200T | RETO Amber T200, Reflx Lab 200 tungsten, Candido 200 | Flic Film Cine Colour 200T, Reflx Lab 200T, Silbersaltz35 200T, Hitchcock 200T 5213 |
Vision3 250D | CineStill 400D, RETO Amber D400, Reflx Lab 400 daylight, Reformed Day 400, Candido 400, FilmNeverDie SORA 200 (?) | Flic Film Cine Colour 250D, Reflx Lab 250D, Silbersaltz35 250D, FilmNeverDie KUMO 250D |
Vision3 50D | CineStill 50D, RETO Amber D100, Reflx Lab 100 daylight | Flic Film Cine Colour 50D, Reflx Lab 50D, Silbersaltz35 50D, Hitchcock 50D 5203 |
The following are more Kodak films, but not all; those that don’t seem to respooled or repackaged include the Porta series, Ektar 100, and ColorPlus 200^{1}.
Original film | Process | Respools/repackagings |
---|---|---|
Aerocolor IV 2460 | C41 | Flic Film Elektra 100, Reflx Lab Pro 100, CatLABS X Film 100, SantaColor 100, Popho Luminar 100, Film Washi X, Karmir 160 |
Gold 800 | C41 | Lomography Color 800, Flic Flim Aurora 800 (?) |
UltraMax 400 | C41 | Fujifilm 400, Lomography Color 400 |
Gold 200 | C41 | Fujifilm 200, Lomography Color 200 |
ProImage 100 | C41 | Lomography Color 100 (?) |
Ektachrome 100D | E6 | Flic Film Chrome 100, Reflx Lab 100R, Film Photography Project Chrome, FilmNeverDie CHAMELEON 100 (?) |
Finally, here are the films that I’ve found, all C41 process, that appear to be genuinely independent of Kodak:
And some further mysteries…
Kodak doesn’t even list this one on their website ↩
Part 1: U+237C ⍼ RIGHT ANGLE WITH DOWNWARDS ZIGZAG ARROW
Part 2: update: U+237C ⍼ angzarr;
Many thanks to Alicia Chilcott and Sophie Hawkey-Edwards at St Bride for their help.
While my hunt for the origins of ⍼ is what led me here, this isn’t really a part 3 because I haven’t found any new information. However, I did get the chance to visit St. Bride Library in London, where they held several Monotype catalogues, including the one containing ⍼. Here, I’ll catalogue the catalogues I took a look at. But first, some typesetting terminology to get things sorted.
At St Bride Library, I looked at a total of six documents:
The 1935 and 1976 lists are booklets of, I think, legal-paper size, with the latter seemingly an update of the former, being about twice as thick. They contain lists of fonts and lists of special sorts, such as mathematical sorts and special borders. Although fairly comprehensive for ordinary mathematical use, the mathematical sorts sections are pretty sparse in comparison to the mathematics-specific catalogues.
Now this is an enormous tome. It appears to attempt to catalogue every single special sort made up to 1932, including the S series of sorts ranging from matrix numbers S1 to S3998, with plenty of blank entries left presumably to be updated as more were made. A number of entries appear to be hand-corrected or glued on later.
What was surprising to me was that sorts made specifically for company logos are catalogued and included as special sorts!
I really like this 1957 booklet because it’s a very well-curated selection, with plenty of explanatory text and descriptions of what each sort means when available. The sorts are sorted not by matrix number, but rather by their intended semantic meanings. It apparently is a reference book which complements volume 40, issue number 4 of the Monotype Recorder, Setting Mathematics, with a glossary of mathematical terms and nomenclature of signs, also written by Arthur Phillips, from the winter of 1956. The “Names or descriptions of Mathematical Signs” on page 26 can also be found in the Sorts List.
This booklet comes with an addendum as a slip of paper, containing a single poem by Arthur Phillips titled Mathematical Sorts in Continuous Creation which, with the power of Unicode, I transcribe below, together with its reading.
A ′ ° of ↻ sense, ∈ doubt, ÷ ∞ Will help the theorist out. So this → Which ⇛ creation And ↔, I'm told, A transfinite ∼. A ⧧ Is ≃ The ∜ of a ⊕ (It means the same to you); So ⪪, so >, ≺ a 𝑝∣𝑞 The = Of Bondi, 𝐴𝑢 and Hoyle. |
A prime degree of clockwise sense, An element of doubt, Divided by infinity Will help the theorist out. So this tends to the limit Which converges to creation And mutually implies, I'm told, A transfinite negation. A logical diversity Is asymptotic to The fourth root of a direct sum (It means the same to you); So "smaller than", so "greater than" Contains a joint denial, The logical identity Of Bondi, Gold and Hoyle. |
And finally, here we have once again the 1972 list of mathematical characters, including series L231 containing ⍼, which is bound together with the original list rather than collected alongside as looseleaf. What I found the most interesting was the preamble given before the list, in particular the paragraphs describing the difference between series 569 and L231.
It should be understood that Series 569 contains only those characters which are (a) mathematical and (b) recognised internationally. We are periodically asked to supply characters which do not meet these conditions. In order to meet such requests we have therefore provided the supplementary fount L231.
L213-10 pt has the same Group numbers, and conforms with Series 569 in all respect of matrix manufacture, casting and usage. However, we do suggest that customers, on finding that a particular character they require is made in L231, should, nevertheless, still endeavour to persuade their customer or author to use normal notation from Series 569. We shall be only too pleased to provide any relevant data regarding why the character has been made in the L231 category, should this be requested.
The implication is that, in 1972 at least, Monotype did retain records for the meanings of specials sorts including those in L231! I haven’t found any evidence of updated tomes of Monotype Special Sorts, so I imagine that they would be internal documents rather than publications. Unfortunately, such documents would probably be in the now-defunct Type Archive, whose materials are in storage in the Science Museum Group’s National Collections Centre.
]]>That paper draft along with the supplementary material will have all the details, but I’ve decided that what I want to focus on in this post is all the other variations on the system we’ve tried that are either inconsistent or less expressive. This means I won’t cover a lot of motivation or examples (still a work in progress), or mention any metatheory unless where relevant; those can be found in the paper. Admittedly, these are mostly notes for myself, and I go at a pace that sort of assumes enough familiarity with the system to be able to verify well-typedness mentally, but this might be interesting to someone else too.
The purpose of StraTT is to introduce a different way of syntactically dealing with type universes, and the premise is to take a predicative type theory with a universe hierarchy, and instead of stratifying universes by levels, you instead stratify typing judgements themselves, a strategy inspired by Stratified System F.^{2} This means level annotations appear in the shape of the judgement $\Gamma \vdash a \mathbin{:}^{j} A$, where $a$ is a term, $A$ is a type, $j$ is a level, and $\Gamma$ is a list of declarations $x \mathbin{:}^{j} A$.
Here are the rules for functions.
$\frac{ \Gamma \vdash A \mathbin{:}^{j} \star \quad \Gamma, x \mathbin{:}^{j} A \vdash B \mathbin{:}^{k} \star \quad j < k }{ \Gamma \vdash \Pi x \mathbin{:}^{j} A \mathpunct{.} B \mathbin{:}^{k} \star } \quad \frac{ \Gamma, x \mathbin{:}^{j} A \vdash b \mathbin{:}^{k} B \quad j < k }{ \Gamma \vdash \lambda x \mathpunct{.} b \mathbin{:}^{k} \Pi x \mathbin{:}^{j} A \mathpunct{.} B } \quad \frac{ \Gamma \vdash b \mathbin{:}^{k} \Pi x \mathbin{:}^{j} A \mathpunct{.} B \quad \Gamma \vdash a \mathbin{:}^{j} A }{ \Gamma \vdash b \; a \mathbin{:}^{k} B[x \mapsto a] }$Function introduction and elimination are as expected, just with level annotations sprinkled in; the main difference is in the dependent function type. Just as the stratification of universes into a hierarchy serves to predicativize function types, so too does stratification of judgements, here done explicitly through the constraint $j < k$. This means that if you have a function of type $\Pi x \mathbin{:}^{j} \star \mathpunct{.} B$ at level $k$, you can’t pass that same type to the function as the type argument, since its level is too big.
Because universes no longer have levels, we do in fact have a type-in-type rule $\Gamma \vdash \star \mathbin{:}^{j} \star$ for any $j$. But this is okay!^{3} The judgement stratification prevents the kind of self-referential tricks that type-theoretic paradoxes typically take advantage of. The simplest such paradox is Hurkens’ paradox, which is still quite complicated, but fundamentally involves the following type.
$\mathsf{\textcolor{#bf616a}{U}} \mathbin{:} \star^{1} \coloneqq \Pi X \mathbin{:}^{0} \star \mathpunct{.} ((X \to \star) \to \star) \to X) \to ((X \to \star) \to \star)$For the paradox to work, the type argument of a function of type $\mathsf{\textcolor{#bf616a}{U}}$ needs to be instantiated with $\mathsf{\textcolor{#bf616a}{U}}$ itself, but stratification prevents us from doing that, since $1 \nleq 0$.
Cumulativity is both normal to want and possible to achieve. There are two possible variations to achieve it: one adds cumulativity to the variable rule and leaves conversion alone.
$\frac{ x \mathbin{:}^{j} A \in \Gamma \quad j \le k }{ \Gamma \vdash x \mathbin{:}^{k} A } \quad \frac{ \Gamma \vdash a \mathbin{:}^{k} A \quad A \equiv B }{ \Gamma \vdash a \mathbin{:}^{k} B }$Alternatively, the variable rule can be left alone, and cumulativity integrated into the conversion rule.
$\frac{ x \mathbin{:}^{j} A \in \Gamma }{ \Gamma \vdash x \mathbin{:}^{j} A } \quad \frac{ \Gamma \vdash a \mathbin{:}^{j} A \quad A \equiv B \quad j \leq k }{ \Gamma \vdash a \mathbin{:}^{k} B }$Either set is admissible in terms of the other. I’m not going to tell you which one I’ve picked.
Level annotations are tedious and bothersome. Can we omit them from function types? The answer is no. Doing so allows us to derive exactly the inconsistency we set out to avoid. Suppose our function type domains aren’t annotated with levels, and let $u \mathbin{:}^{1} \mathsf{\textcolor{#bf616a}{U}}$. Then by cumulativity, we can raise its level, then apply it to $\mathsf{\textcolor{#bf616a}{U}}$.
$\frac{ u \mathbin{:}^{1} \mathsf{\textcolor{#bf616a}{U}} \vdash u \mathbin{:}^{2} \Pi X \mathbin{:} \star \mathpunct{.} ((X \to \star) \to \star) \to X) \to ((X \to \star) \to \star) \quad \vdash \mathsf{\textcolor{#bf616a}{U}} \mathbin{:}^{1} \star }{ u \mathbin{:}^{1} \mathsf{\textcolor{#bf616a}{U}} \vdash u \; \mathsf{\textcolor{#bf616a}{U}} \mathbin{:}^{2} (((\mathsf{\textcolor{#bf616a}{U}} \to \star) \to \star) \to \mathsf{\textcolor{#bf616a}{U}}) \to ((\mathsf{\textcolor{#bf616a}{U}} \to \star) \to \star) }$The strict level constraint is still there, since the level of $\mathsf{\textcolor{#bf616a}{U}}$ remains strictly smaller than that of $u$. But without the annotation, the allowed level of the domain can rise as high as possible, yet still within the constraint, via cumulativity.
The formalism of StraTT also supports a global context $\Delta$ which consists of a list of global definitions $x \mathbin{:}^{j} A \coloneqq a$, where $x$ is a constant of type $A$ and body $a$ at level $j$. Separately from cumulativity, we have displacement, which is based on Conor McBride’s notion of “crude-but-effective stratification”:^{4} global definitions are defined with fixed, constant levels, and then uniformly incremented as needed. This provides a minimalist degree of code reusability across levels without actually having to introduce any sort of level polymorphism. Formally, we have the following rules for displaced constants and their reduction behaviour that integrates both displacement and cumulativity.
$\frac{ x \mathbin{:}^{j} A \coloneqq a \in \Delta \quad i + j \leq k }{ \Delta; \Gamma \vdash x^{i} \mathbin{:}^{k} A^{+i} } \quad \frac{ x \mathbin{:}^{j} A \coloneqq a \in \Delta }{ \Delta \vdash x^{i} \rightsquigarrow a^{+i} }$The metafunction ${\cdot}^{+i}$ recursively adds $i$ to all levels and displacements in a term; below are the two cases where the increment actually has an effect.
$\begin{align*} (\Pi x \mathbin{:}^{j} A. B)^{+i} &≝ \Pi x \mathbin{:}^{i + j} A^{+i}. B^{+i} \\ (x^{j})^{+i} &≝ x^{i + j} \end{align*}$As an example of where displacement is useful, we can define a function that takes an argument $u$ of the displaced type $\mathsf{\textcolor{#bf616a}{U}}^{1}$, which now takes a type argument at level $1$, so $u \; \mathsf{\textcolor{#bf616a}{U}}$ is then well typed.
You’ll notice that I’ve been writing nondependent functions with arrows but without any level annotations. This is because in StraTT, there’s a separate syntactic construct for nondependent functions which behaves just like functions but no longer has the strict stratification restriction.
$\frac{ \Gamma \vdash A \mathbin{:}^{k} \star \quad \Gamma \vdash B \mathbin{:}^{k} \star }{ \Gamma \vdash A \to B \mathbin{:}^{k} \star } \quad \frac{ \Gamma, x \mathbin{:}^{k} A \vdash b \mathbin{:}^{k} B }{ \Gamma \vdash \lambda x \mathpunct{.} b \mathbin{:}^{k} A \to B } \quad \frac{ \Gamma \vdash b \mathbin{:}^{k} A \to B \quad \Gamma \vdash a \mathbin{:}^{k} A }{ \Gamma \vdash b \; a \mathbin{:}^{k} B }$Removing the restriction is okay for nondependent function types, because there’s no risk of dependently instantiating a type variable in the type with the type itself.
These are called floating functions because rather than being fixed, the level of the domain of the function type “floats” with the level of overall function type (imagine a buoy bobbing along as the water rises and falls). Specifically, cumulativity lets us derive the following judgement.
$\frac{ f \mathbin{:}^{j} A \to B \vdash f \mathbin{:}^{j} A \to B \quad j \leq k }{ f \mathbin{:}^{j} A \to B \vdash f \mathbin{:}^{k} A \to B }$Unusually, if we start with a function $f$ at level $j$ that takes some $A$ at level $j$, we can use $f$ at level $k$ as if it now takes some $A$ at a higher level $k$. The floating function $f$ is covariant in the domain with respect to the levels! Typically functions are contravariant or invariant in their domains with respect to the ambient ordering on types, universes, levels, etc.^{5}
Designing StraTT means not just designing for consistency but for expressivity too: fundamentally revamping how we think about universes is hardly useful if we can’t also express the same things we could in the first place.
First, here’s an example of some Agda code involving self-applications of identity functions in ways such that universe levels are forced to go up to in a nontrivial manner (i.e. not just applying bigger and bigger universes to something). Although I use universe polymorphism to define $\mathsf{\textcolor{#bf616a}{ID}}$, it isn’t strictly necessary, since all of its uses are inlineable without issue.
{-# OPTIONS --cumulativity #-}
open import Level
ID : ∀ ℓ → Set (suc ℓ)
ID ℓ = (X : Set ℓ) → X → X
𝟘 = zero
𝟙 = suc zero
𝟚 = suc 𝟙
𝟛 = suc 𝟚
-- id id
idid1 : ID 𝟙 → ID 𝟘
idid1 id = id (ID 𝟘) (λ x → id x)
-- id (λid. id id) id
idid2 : ID 𝟚 → ID 𝟘
idid2 id = id (ID 𝟙 → ID 𝟘) idid1 (λ x → id x)
-- id (λid. id (λid. id id) id) id
idid3 : ID 𝟛 → ID 𝟘
idid3 id = id (ID 𝟚 → ID 𝟘) idid2 (λ x → id x)
In $\mathsf{\textcolor{#bf616a}{idid1}}$, to apply $\mathit{id}$ to itself, its type argument needs to be instantiated with the type of $\mathit{id}$, meaning that the level of the $\mathit{id}$ on the left needs to be one larger, and the $\mathit{id}$ on the right needs to be eta-expanded to fit the resulting type with the smaller level. Repeatedly applying self-applications increases the level by one each time. I don’t think anything else I can say will be useful; this is one of those things where you have to stare at the code until it type checks in your brain (or just dump it into the type checker).
In the name of expressivity, we would like these same definitions to be typeable in StraTT. Suppose we didn’t have floating functions, so every function type needs to be dependent. This is the first definition that we hope to type check.
$$\begin{array}{rl}{\displaystyle {\mathsf{idid1}}}& {\displaystyle {:}^{3}\mathrm{\Pi}id{:}^{2}(\mathrm{\Pi}X{:}^{1}\star \mathrm{.}\mathrm{\Pi}x{:}^{1}X\mathrm{.}X)\mathrm{.}(\mathrm{\Pi}X{:}^{0}\star \mathrm{.}\mathrm{\Pi}x{:}^{0}X\mathrm{.}X)}\\ & {\displaystyle \mathrm{\u2254}\lambda id\mathrm{.}id\text{\hspace{0.25em}\hspace{0.05em}}(\mathrm{\Pi}X{:}^{0}\star \mathrm{.}\mathrm{\Pi}x{:}^{0}X\mathrm{.}X)\text{\hspace{0.25em}\hspace{0.05em}}(\lambda X\text{\hspace{0.25em}\hspace{0.05em}\hspace{0.05em}}x\mathrm{.}\overline{)id\text{\hspace{0.25em}\hspace{0.05em}}X\text{\hspace{0.25em}\hspace{0.05em}}x})}\end{array}$$The problem is that while $\mathit{id}$ expects its second argument to be at level $1$, the actual second argument contains $\mathit{id}$ itself, which is at level $2$, so such a self-application could never fit! And by stratification, the level of the argument has to be strictly smaller than the level of the overall function, so there’s no annotation we could fiddle about with to make $\mathit{id}$ fit in itself.
What floating functions grant us is the ability to do away with stratification when we don’t need it. The level of a nondependent argument to a function can be as large as the level of the function itself, which is exactly what we need to type check this first definition that now uses floating functions.
$\begin{align*} \mathsf{\textcolor{#bf616a}{idid1}} &\mathbin{:}^{2} (\Pi X \mathbin{:}^{1} \star \mathpunct{.} X \to X) \to (\Pi X \mathbin{:}^{0} \star \mathpunct{.} X \to X) \\ &\coloneqq \lambda \mathit{id} \mathpunct{.} \mathit{id} \; (\Pi X \mathbin{:}^{0} \mathpunct{.} \star X \to X) \; (\lambda X \mathpunct{.} \mathit{id} \; X) \end{align*}$The corresponding definitions for $\mathsf{\textcolor{#bf616a}{idid2}}$ and $\mathsf{\textcolor{#bf616a}{idid3}}$ will successfully type check too; exercise for the reader.
I’ve made it sound like all nondependent functions can be made to float, but unfortunately this isn’t always true. If the function argument itself is used in a nondependent function domain, then that argument is forced to be fixed, too. Consider for example the following predicate that states that the given type is a mere proposition, i.e. that all of its inhabitants are indistinguishable.
$\begin{align*} \mathsf{\textcolor{#bf616a}{isProp}} &\mathbin{:}^1 \Pi X \mathbin{:}^0 \star \mathpunct{.} \star \\ &\coloneqq \lambda X \mathpunct{.} \Pi x \mathbin{:}^0 X \mathpunct{.} \Pi y \mathbin{:}^0 X \mathpunct{.} \Pi P \mathbin{:}^0 X \to \star \mathpunct{.} P \; x \to P \; y \end{align*}$If $\mathsf{\textcolor{#bf616a}{isProp}}$ were instead assigned the floating function type $\star \to \star$ at level $1$, the type argument $X$ being at level $1$ would force the level of $P$ to also be $1$, which would force the overall definition to be at level $2$, which would then make $X$ be at level $2$, and so on.
Not only are not all nondependent functions necessarily floating, not all functions that could be floating necessarily need to be nondependent, either. I’m unsure of how to verify this, but I don’t see why function types like the identity function type $(X \mathbin{:} \star) \to X \to X$ can’t float; I can’t imagine that being exploited to derive an inconsistency. The fixed/floating dichotomy appears to be independent of the dependent/nondependent dichotomy, and matching them up only an approximation.
Before I move on to the next design decision, I want to state and briefly discuss an important lemma used in our metatheory. We’ve proven type safety (that is, progress and preservation lemmas) for StraTT, and the proof relies on a restriction lemma.
Definition: Given a context $\Gamma$ and a level $j$, the restriction $\lceil\Gamma\rceil_{j}$ discards all declarations in $\Gamma$ of level strictly greater than $j$.
Lemma [Restriction]: If $\vdash \Gamma$ and $\Gamma \vdash a \mathbin{:}^{j} A$, then $\vdash \lceil\Gamma\rceil_{j}$ and $\lceil\Gamma\rceil_{j} \vdash a \mathbin{:}^{j} A$.
For the lemma to hold, no derivation of $\Gamma \vdash a \mathbin{:}^{j} A$ can have premises whose level is strictly greater than $j$; otherwise, restriction will discard those necessary pieces. This lemma is crucial in proving specifically the floating function case of preservation; if we didn’t have floating functions, the lemma isn’t necessary.
As it stands, StraTT enforces that the level of a term’s type is exactly the level of the term. In other words, the following regularity lemma is provable.
Lemma [Regularity]: If $\Gamma \vdash a \mathbin{:}^{j} A$, then $\Gamma \vdash A \mathbin{:}^{j} \star$.
But what if we relaxed this requirement?
Lemma [Regularity (relaxed)]: If $\Gamma \vdash a \mathbin{:}^{j} A$, then there is some $k \geq j$ such that $\Gamma \vdash A \mathbin{:}^{k} \star$.
In such a new relaxed StraTT (RaTT), the restriction lemma is immediately violated, since $\Gamma ≝ A \mathbin{:}^{1} \star, x \mathbin{:}^{0} A$ is a well-formed context, but $\lceil\Gamma\rceil_{0} = x \mathbin{:}^{0} A$ isn’t even well scoped. So a system with relaxed levels that actually makes use of the relaxation can no longer accommodate floating functions because type safety won’t hold.
However, the dependent functions of RaTT are more expressive: the $\mathit{id}$ self-application can be typed using dependent functions alone!^{6} First, let’s pin down the new rules for our functions.
$\frac{ \Gamma \vdash A \mathbin{:}^{k} \star \quad \Gamma, x \mathbin{:}^{j} A \vdash B \mathbin{:}^{k} \star \quad j < k }{ \Gamma \vdash \Pi x \mathbin{:}^{j} A \mathpunct{.} B \mathbin{:}^{k} \star } \quad \frac{ \Gamma, x \mathbin{:}^{j} A \vdash b \mathbin{:}^{k} B }{ \Gamma \vdash \lambda x \mathpunct{.} b \mathbin{:}^{k} \Pi x \mathbin{:}^{j} A \mathpunct{.} B } \quad \frac{ \Gamma \vdash b \mathbin{:}^{k} \Pi x \mathbin{:}^{j} A \mathpunct{.} B \quad \Gamma \vdash a \mathbin{:}^{j} A }{ \Gamma \vdash b \; a \mathbin{:}^{k} B[x \mapsto a] }$Two things have changed:
Since the premise $j < k$ still exists in the type formation rule, we can be assured that a term of type $\mathsf{\textcolor{#bf616a}{U}}$ still can’t be applied to $\mathsf{\textcolor{#bf616a}{U}}$ itself and that we still probably have consistency.^{7} Meanwhile, the absence of $j < k$ in the function introduction rule allows us to inhabit, for instance, the identity function type $\Pi X \mathbin{:}^{1} \star \mathpunct{.} \Pi x \mathbin{:}^{0} X \mathpunct{.} X$, where the level annotations no longer match.
These rules let us assign levels to the $\mathit{id}$ self-application as follows.
$\begin{align*} \mathsf{\textcolor{#bf616a}{idid1}} &\mathbin{:}^{0} \Pi \mathit{id} \mathbin{:}^{0} (\Pi X \mathbin{:}^{1} \star \mathpunct{.} \Pi x \mathbin{:}^{0} X \mathpunct{.} X) \mathpunct{.} (\Pi X \mathbin{:}^{0} \star \mathpunct{.} \Pi x \mathbin{:}^{0} X \mathpunct{.} X) \\ &\coloneqq \lambda \mathit{id} \mathpunct{.} \mathit{id} \; (\Pi X \mathbin{:}^{0} \star \mathpunct{.} \Pi x \mathbin{:}^{0} X \mathpunct{.} X) \; (\lambda X \; x \mathpunct{.} \mathit{id} \; X \; x) \end{align*}$Although the type of $\mathit{id}$ must live at level $2$, the $\mathit{id}$ term can be assigned a lower level $0$. This permits the self-application to go through, since the second argument of $\mathit{id}$ demands a term at level $0$. Unusually, despite the second $\mathit{id}$ being applied to a type at level $1$, the overall level of the second argument is still $0$ because quantification over arguments at higher levels is permitted.
Despite the apparently added expressivity of these new dependent functions, unfortunately they’re still not as expressive as actually having floating functions. This was something I discovered while trying to type check Hurkens’ paradox and found that fewer definitions were typeable. I’ve attempted to isolate the issue a bit by revealing a similar problem when working with CPS types. Let’s first look at some definitions in the original StraTT with floating functions.
$\begin{array}{r l l} \mathsf{\textcolor{#bf616a}{CPS}} &\mathbin{:}^{1} \star \to \star &\coloneqq \lambda A \mathpunct{.} \Pi X \mathbin{:}^{0} \star. (A \to X) \to X \\ \mathsf{\textcolor{#bf616a}{return}} &\mathbin{:}^{1} \Pi A \mathbin{:}^{0} \star \mathpunct{.} A \to \mathsf{\textcolor{#bf616a}{CPS}} \; A &\coloneqq \lambda A \; a \; X \; f. f \; a \\ \mathsf{\textcolor{#bf616a}{run}} &\mathbin{:}^{1} \Pi A \mathbin{:}^{0} \star \mathpunct{.} \mathsf{\textcolor{#bf616a}{CPS}} \; A \to A &\coloneqq \lambda A \; f \mathpunct{.} f \; A \; (\lambda a \mathpunct{.} a) \\ \mathsf{\textcolor{#bf616a}{idCPS}} &\mathbin{:}^{2} \Pi A \mathbin{:}^{0} \star \mathpunct{.} \mathsf{\textcolor{#bf616a}{CPS}}^{1} \; A \to \mathsf{\textcolor{#bf616a}{CPS}} \; A &\coloneqq \lambda A \; f. f \; (\mathsf{\textcolor{#bf616a}{CPS}} \; A) \; (\mathsf{\textcolor{#bf616a}{return}} \; A) \\ \mathsf{\textcolor{#bf616a}{runReturn}} &\mathbin{:}^{2} \Pi A \mathbin{:}^{0} \star \mathpunct{.} \mathsf{\textcolor{#bf616a}{CPS}}^{1} \; A \to \star &\coloneqq \lambda A \; f \mathpunct{.} \mathsf{\textcolor{#bf616a}{run}} \; A \; (\mathsf{\textcolor{#bf616a}{idCPS}} \; A \; f) = \mathsf{\textcolor{#bf616a}{run}}^{1} \; A \; f \end{array}$$\mathsf{\textcolor{#bf616a}{CPS}}$ translates a type to its answer-polymorphic CPS form, and $\mathsf{\textcolor{#bf616a}{return}}$ translates a term into CPS. $\mathsf{\textcolor{#bf616a}{run}}$ does the opposite and runs the computation with the identity continuation to yield a term of the original type. $\mathsf{\textcolor{#bf616a}{runReturn}}$ is a proposition we might want to prove about a given type and a computation of that type in CPS: passing $\mathsf{\textcolor{#bf616a}{return}}$ as the continuation for our computation (as in $\mathsf{\textcolor{#bf616a}{idCPS}}$) to get another computation should, when run, yield the exact same result as running the computation directly.
By careful inspection,^{8} everything type checks. Displacements needed are in $\mathsf{\textcolor{#bf616a}{idCPS}}$ and $\mathsf{\textcolor{#bf616a}{runReturn}}$ on the type of the computation $f$. This displacement raises the level of the answer type argument of $f$ so that the type can be instantiated with $\mathsf{\textcolor{#bf616a}{CPS}} \; A$. Consequently, we also need to displace the $\mathsf{\textcolor{#bf616a}{run}}^{1}$ that runs the displaced computation on the right-hand side of $\mathsf{\textcolor{#bf616a}{runReturn}}$. Meanwhile, left-hand $\mathsf{\textcolor{#bf616a}{run}}$ shouldn’t be displaced because it runs the undisplaced computation returned by $\mathsf{\textcolor{#bf616a}{idCPS}}$.
Now let’s try to do the same in the new RaTT without floating functions. I’ll only write down the definition of $\mathsf{\textcolor{#bf616a}{CPS}}$ and $\mathsf{\textcolor{#bf616a}{idCPS}}$, and inline $\mathsf{\textcolor{#bf616a}{return}}$ in the latter.
$$\begin{array}{rll}{\textstyle {\mathsf{CPS}}}& {\textstyle {:}^{1}\mathrm{\Pi}A{:}^{0}\star \mathrm{.}\star}& {\textstyle \mathrm{\u2254}\lambda A\mathrm{.}\mathrm{\Pi}X{:}^{0}\star \mathrm{.}\mathrm{\Pi}f{:}^{0}(\mathrm{\Pi}a{:}^{0}A\mathrm{.}X)\mathrm{.}X}\\ {\textstyle {\mathsf{idCPS}}}& {\textstyle {:}^{0}\mathrm{\Pi}A{:}^{0}\star \mathrm{.}\mathrm{\Pi}f{:}^{0}{{\mathsf{CPS}}}^{1}\text{\hspace{0.25em}\hspace{0.05em}}A\mathrm{.}{\mathsf{CPS}}\text{\hspace{0.25em}\hspace{0.05em}}A}& {\textstyle \mathrm{\u2254}\lambda A\text{\hspace{0.25em}\hspace{0.05em}}f\mathrm{.}f\text{\hspace{0.25em}\hspace{0.05em}}({\mathsf{CPS}}\text{\hspace{0.25em}\hspace{0.05em}}A)\text{\hspace{0.25em}\hspace{0.05em}}(\lambda a\text{\hspace{0.25em}\hspace{0.05em}}X\text{\hspace{0.25em}\hspace{0.05em}}g\mathrm{.}\overline{)g\text{\hspace{0.25em}\hspace{0.05em}}a})}\end{array}$$This is a little more difficult to decipher, so let’s look at what the types of the different pieces of $\mathsf{\textcolor{#bf616a}{idCPS}}$ are.
To fix this, we might try to increase the level annotation on $a$ in the definition of $\mathsf{\textcolor{#bf616a}{CPS}}$ so that $g$ can accommodate $a$. But doing so would also increase the level annotation on $a$ in the displaced type of $f$ to $2$, meaning that $a$ is still too large to fit into $g$. The uniformity of the displacement mechanism means that if everything is annotated with a level, then everything moves up at the same time. Previously, floating functions allowed us to essentially ignore increasing levels due to displacement, but also independently increase them via cumulativity.
Of course, we could simply define a second $\mathsf{\textcolor{#bf616a}{CPS}}$ definition with a different level, i.e. $\Pi X \mathbin{:}^{j} \star \mathpunct{.} \Pi f \mathbin{:}^{0} (\Pi a \mathbin{:}^{0} A \mathpunct{.} X) \mathpunct{.} X$ for the level $j$ that we need, but then to avoid code duplication, we’re back at needing some form of level polymorphism over $j$.
So far, I’ve treated level polymorphism as if it were something unpleasant to deal with and difficult to handle, and this is because level polymorphism is unpleasant and difficult. On the useability side, I’ve found that level polymorphism in Agda muddies the intent of the code I want to write and produces incomprehensible type errors while I’m writing it, and I hear that universe polymorphism in Coq is a similar beast. Of course, StraTT is still far away from being a useable tool, but in the absence of more complex level polymorphism, we can plan and design for more friendly elaboration-side features such as level inference.
On the technical side, it’s unclear how you might assign a type and level to something like $\forall k \mathpunct{.} \Pi x \mathbin{:}^{k} A \mathpunct{.} B$, since $x$ essentially quantifies over derivations at all levels. We would also need bounded quantification from both ends for level instantiations to be valid. For instance, $\Pi x \mathbin{:}^{k} (\Pi y \mathbin{:}^{2} A \mathpunct{.} B \; y) \mathpunct{.} C \; x$ requires that the level $k$ of the type of the domain be strictly greater than $2$, so the quantification is $\forall k > 2$; and flipping the levels, $\Pi x \mathbin{:}^{2} (\Pi y \mathbin{:}^{k} A \mathpunct{.} B \; y) \mathpunct{.} C \; x$ would require $\forall k < 2$. While assigning a level to a quantification bounded above is easy (the level of the second type above can be $3$), assigning a level to a quantification bounded below is as unclear as assigning one to an unbounded quantification.
At this point, we could either start using ordinals in general for our levels and always require upper-bounded quantification whose upper bound could be a limit ordinal, or we can restrict level quantification to prenex polymorphism at definitions, which is roughly how Coq’s universe polymorphism works, only implicitly, and to me the more reasonable option:
We believe there is limited need for […] higher-ranked universe polymorphism for a cumulative universe hierarchy.
― Favonia et al. (2023)^{9}
With the second option, we have to be careful not to repeat the same mistakes as Coq’s universe polymorphism. There, by default, every single (implicit) universe level in a definition is quantified over, and every use of a definition generates fresh level variables for each quantified level, so it’s very easy (albeit somewhat artificial) to end up with exponentially-many level variables to handle relative to the number of definitions. On the other hand, explicitly quantifying over level variables is tedious if you must instantiate them yourself, and it’s tricky to predict which ones you really do want to quantify over.
Level polymorphism is clearly more expressive, since in a definition of type $\forall i, j \mathpunct{.} \Pi x \mathbin{:}^{i} A \mathpunct{.} \Pi y \mathbin{:}^{j} B \mathpunct{.} C \; x \; y$ for instance, you can instantiate $i$ and $j$ independently, whereas displacement forces you to always displace both by the same amount. But so far, it’s unclear to me in what scenarios this feature would be absolutely necessary and not manageable with just floating functions and futzing about with the level annotations a little.
The variant of StraTT that the paper covers is the one with stratified dependent functions, nondependent floating functions, and displacement. We’ve proven type safety, mechanized a model of StraTT without floating functions (but not the interpretation from the syntax to the semantics), and implemented StraTT extended with datatypes and level/displacement inference.
The paper doesn’t yet cover any discussion of RaTT with relaxed level constraints. I’m still deliberating whether it would be worthwhile to update my mechanized model first before writing about it, just to show that it is a variant that deserves serious consideration, even if I’ll likely discard it at the end. The paper doesn’t cover any discussion on level polymorphism either, and if I don’t have any more concrete results, I probably won’t go on to include it.
It doesn’t have to be for this current paper draft, but I’d like to have a bit more on level inference in terms of proving soundness and completeness. Soundness should be evident (we toss all of the constraints into an SMT solver), but completeness would probably require some ordering on different level annotations of the same term such that well-typedness of a “smaller” annotation set implies well-typedness of a “larger” annotation set, formalizing the inference algorithm, then showing that the algorithm always produces the “smallest” annotation set.
Daniel Leivant. Stratified polymorphism. LICS 1989. doi:10.1109/LICS.1989.39157. ↩
For the semantically-inclined, it is possible to construct a model of these universes, as I’ve done in this Agda model, which also sketches out how the rest of consistency might be proven. ↩
Conor McBride. Crude but Effective Stratification. 2011. https://mazzo.li/epilogue/index.html%3Fp=857.html. ↩
This domain covariance prevents us from extending the above model with floating function types, because this behaviour semantically… well, kind of doesn’t make sense. So consistency with floating functions is an open problem. ↩
Thanks to Yiyun Liu for pointing this out about the system. ↩
The Agda model could probably be tweaked to accommodate this new system; I just haven’t done it. ↩
Or by using the implementation to type check for me, which is what I did. ↩
Favonia, Carlo Angiuli, Reed Mullanix. An Order-Theoretic Analysis of Universe Polymorphism. POPL 2023. doi:10.1145/3571250. ↩
Work | Summary |
---|---|
Nakano [0] | STLC + recursive types + guarded types |
Birkdeal, Møgelberg, Schwinghammer, Stovring [1] | dependent types + recursive types + guarded types |
Atkey and McBride [2] | STLC + recursive types + guarded types + clocks |
Birkedal and Møgelberg [3] | dependent types + guarded types |
Møgelberg [4] | dependent types + guarded types + clocks |
GDTT [6] | dependent types + guarded types + clocks + delayed substitution |
CloTT [7] | dependent types + guarded types + ticks + clocks |
GCTT [8] | cubical type theory + guarded types + delayed substitution |
TCTT [9] | cubical type theory + guarded types + ticks |
CCTT [10] | cubical type theory + guarded types + ticks + clocks |
Guarded types were first introduced by Nakano [0] in an STLC with recursive types and subtyping such that the following hold (using the modern ⊳ notation):
A ≼ ⊳A
A → B ≼ ⊳A → ⊳B ≼ ⊳(A → B)
Modern type systems present the guarded modality as an applicative functor instead (laws omitted below), together with a guarded fixpoint operator:
next : A → ⊳A
ap : ⊳(A → B) → ⊳A → ⊳B
dfix : (⊳A → A) → ⊳A
dfix f ≃ next (f (dfix f))
fix : (⊳A → A) → A
fix f ≝ f (dfix f) -- ≃ f (next (fix f))
Given some endofunctor F
on types, μ(F ∘ ⊳)
is a fixpoint of F ∘ ⊳
;
the anamorphism is defined through fix
.
We then have the tools for programming with guarded recursive types μ(F ∘ ⊳)
.
Birkdeal, Møgelberg, Schwinghammer, and Stovring [1]
present a model of guarded dependent type theory,
then providing the tools for proving with guarded recursive types.
To coprogram with coinductive types,
Atkey and McBride [2] index the modality and the above constructs by a clock κ
,
and show that νF ≝ ∀κ. μ(F ∘ ⊳ᴷ)
is a cofixpoint of F
.
Concretely, for example, μX. A × ⊳X
is the type of guarded streams of A
,
while ∀κ. μX. A × ⊳ᴷX
is the type of coinductive streams of A
.
This system additionally requires that F
commute with clock quantification
(which can be shown metatheoretically for strictly positive functors F
),
as well as a construct that permits immediately “running” a clock that was just quantified.
force : (∀κ. ⊳ᴷA) → (∀κ. A)
This system disallows instantiating a clock quantification using a clock that’s free in the type, which
disallows us from using the same clock variable twice, preventing multiple “time-streams” from becoming conflated
but Bizjak and Møgelberg [5] remove this restriction by giving a model in which two clocks can be “synchronized”.
Birkedal and Møgelberg [3]
show that in a dependently-typed system,
guarded recursive types can be defined as guarded recursive functions on types,
and Møgelberg [4]
shows that coinductive types can be too using clock quantification.
More precisely, since fix
can act on types, the cofixpoint type itself can be defined in terms of it too:
▸ᴷ : ⊳ᴷ𝒰 → 𝒰
▸ᴷ (nextᴷ A) ≃ ⊳ᴷA
νF ≝ ∀κ. fixᴷ (F ∘ ▸ᴷ)
νF ≃ F νF
These dependent type theories provide the following distributive law of modalities over dependent functions:
d : ⊳((x : A) → B) → (x : A) → ⊳B
d (nextᴷ f) ≃ nextᴷ ∘ f
This isn’t the corresponding notion of ap
for dependent functions,
but given a function f : ⊳((x : A) → B)
and an argument x : ⊳A
,
what would the type of the application of f
to x
be?
Bizjak, Grathwohl, Clouston, Møgelberg, and Birkedal [6]
resolve this issue with a notion of delayed substitution in their Guarded Dependent Type Theory (GDTT),
which waits for x
to reduce to next a
to substitute into B
.
Alternatively, and apparently more popularly,
Bahr, Grathwohl, and Møgelberg [7]
introduce a notion of ticks, which are inhabitants of clocks,
in their Clocked Type Theory (CloTT).
Notably, the ticked modality acts like quantification over ticks,
and CloTT provides reduction semantics to guarded types and guarded fixpoints.
The postulated constructs of guarded type theory can be implemented using ticks,
and there is a special ⬦
that can be applied to ticks whose clock isn’t used elsewhere in the context.
next : ∀κ. A → ⊳(α : κ). A
nextᴷ a _ ≝ a
ap : ∀κ. ⊳(α : κ).(A → B) → ⊳(α : κ).A → ⊳(α : κ).B
apᴷ f a α ≝ (f α) (a α)
dfix : ∀κ. (⊳(α : κ).A → A) → ⊳(α : κ).A
dfixᴷ f ⬦ ⇝ f (dfixᴷ f)
fix : (⊳(α : κ).A → A) → A
fixᴷ f ≝ f (dfixᴷ f)
▸ᴷ : ⊳(α : κ).𝒰 → 𝒰
▸ᴷ A ≝ ⊳(α : κ).(A α)
force : (∀κ. ⊳(α : κ).A) → (∀κ. A)
force f κ ≝ f κ ⬦
There is still one significant deficiency in the guarded type theories so far: reasoning about bisimulation using the usual identity type remains difficult. Recently, there has been a trend of augmenting guarded type theories with cubical path types (or rather, augmenting cubical type theory with guarded types): Guarded Cubical Type Theory (GCTT) [8] is a cubical variant of GDTT, Ticked Cubical Type Theory (TCTT) [9] is a cubical variant of CloTT without clock quantification, and Clocked Cubical Type Theory (CCTT) [10] is a cubical variant of CloTT with clock quantification. In all of these, the cubical path type corresponds to bisimilarity of guarded recursive types and guarded coinductive types. Alternatively, McBride [11] shows that observational equality can also implement bisimilarity.
There have been several theses and dissertations on the topic of guarded types that deserve mentioning, some of which cover some of the papers already mentioned above. The following are just the ones I know of.
Vezzosi’s licentiate thesis [12] collects together two separate works. The first is a mechanized proof of strong normalization of a simply-typed variant of Birkedal and Møgelberg’s system [3]. The second extends a clocked, guarded dependent type theory by defining a well-order on clocks (which they call Times) and additionally adds existential quantification to enable encoding inductive types. In fact, the system resembles sized types more than it does guarded types, save for the fact that inductives and coinductives are still encoded as existentially and universally clock-quantified guarded fixpoints. Because in sized types, fixpoints only reduce when applied to inductive constructor head forms, it’s unclear to me how reduction in their system behaves. Finally, just as in my own thesis, they note that to encode arbitrarily-branching inductive types, a limit operator on Time is required, which continue to believe would render the order undecidable.
Grathwohl’s PhD dissertation [13] mainly deals with GDTT and GCTT. It also includes the guarded λ-calculus from an earlier paper, which adds to a guarded STLC a second modal box operator that acts like Atkey and McBride’s clock quantification [2] but as if there is only ever one clock. The dissertation ends with an alternative to GDTT that appears to be an early variant of CloTT.
Bizjak’s PhD dissertation [14] also includes GDTT as well as the same guarded λ-calculus. It provides two different models of guarded types, one of which is the model for clock synchronization [5], and gives applications to modelling System F with recursive types and nondeterminism.
Paviotti’s doctoral thesis [15] is an application of GDTT in synthetic guarded domain theory to give denotation semantics to PCF (which includes the Y combinator) and FPC (which includes recursive types).
[0] Nakano, Hiroshi. A Modality for Recursion. (LICS 2000). ᴅᴏɪ:10.1109/lics.2000.855774.
[1] Birkedal, Lars; Møgelberg, Rasmus Ejlers; Schwinghammer, Jans; Stovring, Kristian. First Steps in Synthetic Guarded Domain Theory: .Step-Indexing in the Topos of Trees. (LICS 2011). ᴅᴏɪ:10.1109/LICS.2011.16.
[2] Atkey, Robert; McBride, Conor. Productive Coprogramming with Guarded Recursion. (ICFP 2013). ᴅᴏɪ:10.1145/2500365.2500597.
[3] Birkedal, Lars; Møgelberg, Rasmus Ejlers. Intensional Type Theory with Guarded Recursive Types qua Fixed Points on Universes. (LICS 2013). ᴅᴏɪ:10.1109/LICS.2013.27.
[4] Møgelberg, Rasmus Ejlers. A type theory for productive coprogramming via guarded recursion. (LICS 2014). ᴅᴏɪ:10.1145/2603088.2603132.
[5] Bizjak, Aleš; Møgelberg, Rasmus Ejlers. A Model of Guarded Recursion With Clock Synchronisation. (MFPS 2015). ᴅᴏɪ:10.1016/j.entcs.2015.12.007.
[6] Bizjak, Aleš; Grathwohl, Hans Bugge; Clouston, Ranald; Møgelberg, Rasmus Ejlers; Birkedal, Lars. Guarded Dependent Type Theory with Coinductive Types. (FoSSaCS 2016). ᴅᴏɪ:10.1007/978-3-662-49630-5_2.
[7] Bahr, Patrick; Grathwohl, Hans Bugge; Møgelberg, Rasmus Ejlers. The Clocks Are Ticking: No More Delays! (LICS 2017). ᴅᴏɪ:10.1109/LICS.2017.8005097.
[8] Birkedal, Lars; Bizjak, Aleš; Clouston, Ranald; Grathwohl, Hans Bugge; Spitters, Bas; Vezzosi, Andrea. Guarded Cubical Type Theory. (Journal of Automated Reasoning, 2019). ᴅᴏɪ:10.1007/s10817-018-9471-7.
[9] Møgelberg, Rasmus Ejlers; Veltri, Niccolò. Bisimulation as Path Type for Guarded Recursive Types. (POPL 2019). ᴅᴏɪ:10.1145/3290317.
[10] Kristensen, Magnus Baunsgaard; Møgelberg, Rasmus Ejlers; Vezzosi, Andrea. Greatest HITs: Higher inductive types in coinductive definitions via induction under clocks. (LICS 2022). ᴅᴏɪ:10.1145/3531130.3533359.
[11] McBride, Conor. Let’s see how things unfold: reconciling the infinite with the intensional. (CALCO 2009). ᴅᴏɪ:10.1007/978-3-642-03741-2_9.
[12] Vezzosi, Andrea. Guarded Recursive Types in Type Theory. (Licentiate thesis, 2015). https://saizan.github.io/vezzosi-lic.pdf.
[13] Grathwohl, Hans Bugge. Guarded Recursive Type Theory. (PhD dissertation, 2016). https://hansbugge.dk/pdfs/phdthesis.pdf.
[14] Bizjak, Aleš. On Semantics and Applications of Guarded Recursion. (PhD dissertation, 2016). https://abizjak.github.io/documents/thesis/semantics-applications-gr.pdf.
[15] Paviotti, Marco. Denotational semantics in Synthetic Guarded Domain Theory. (Doctoral thesis, 2016). https://mpaviotti.github.io/assets/papers/paviotti-phdthesis.pdf.
Recently I needed to convert a large TIFF scan of a duochrome page into something reasonable, i.e. a web-supported image format that was still lossless since it seemed a shame to ruin such a nice high-definition scan with lossy compression. In terms of lossless formats, all browsers^{1} support PNG, WEBP, and AVIF, while I really hope JXL support is imminent.
I therefore wanted to see which file format would perform the best in terms of file size
by converting my ~183 MiB TIFF to each of them using ImageMagick.
For PNG, WEBP, and JXL, there’s an effort setting:
lower effort means faster compression but larger size,
while higher effort means slower compression but smaller size.
I used the highest three settings for these, yielding sizes from ~50 MiB to ~20 MiB.
(As a treat, I’ve also converted to JPG, WEBP, AVIF, and JXL at -quality 0
, i.e. lossy with the worst settings.)
In summary:
Format | Effort | Size (MiB) | Time (s) |
---|---|---|---|
TIFF | N/A | 183.049 | |
PNG | 7 | 53.8905 | |
― | 8 | 52.4796 | |
― | 9 | 52.0456 | |
JXL | 7 | 36.2194 | |
― | 8 | 35.7587 | |
― | 9 | 33.9196 | 1553.39 |
WEBP | 5 | 38.0452 | |
― | 6 | 38.3053 | |
― | 7 | 38.3053 | |
AVIF (magick ) |
2 | 21.2929 | |
― | 3 | 21.3375 | |
― | 4 | 21.3673 | |
― | 5 | 21.3684 | |
AVIF (avifenc ) |
5 | 53.5292 | 173.36 |
― | 4 | 53.5242 | 179.75 |
― | 3 | 53.3450 | 585.62 |
Format | Effort | magick command |
Alternate command |
---|---|---|---|
PNG | 0 – 9 | magick convert -define png:compression-level=%n in.tif out-%n.png |
optipng -o%n -out out-%n.png in.tif ^{2} |
JXL | 3 – 9 | magick convert -quality 100 -define jxl:effort=%n in.tif out-%n.jxl |
cjxl -q 100 -e %n --brotli_effort=11 in.tif out-%n.jxl |
WEBP | 0 – 6 | magick convert -quality 100 -define webp:lossless=true -define webp:method=%n in.tif out-%n.webp |
cwebp -lossless -q 100 -m %n -progress -o in.tif out-%n.webp |
AVIF | 0 – 10 | magick convert -quality 100 -define heic:speed=%n in.tif out.avif |
avifenc -l -s %n in.png out-%n.avif ^{3} |
Format (lossy) | Size (KiB) |
---|---|
JPG | 5848.97 |
JXL | 1805.75 |
WEBP | 695.534 |
AVIF | 349.122 |
I didn’t think the effort settings would affect compression time that much, but it turns out that for JXL, the highest-effort setting takes a really, really long time. It didn’t occur to me to take timing measurements until I got to JXL’s penultimate effort setting, so I only timed it for its slowest effort setting, which also happened to be the very last conversion I was running. It took a whopping 25m 53.39s!
You’ll notice I have different AVIF file sizes for magick
and avifenc
.
I originally only tried magick
, but I was supicious of its extremely small file size,
so I compared the output AVIF at speed 2 against the output PNG at compression level 9.
$ magick compare -verbose -metric mae out-2.avif out-9.png out-avif2-png9.png
...
Channel distortion: MAE
red: 91.0942 (0.00139001)
green: 40.7945 (0.000622484)
blue: 153.756 (0.00234616)
all: 95.2148 (0.00145289)
For comparison, comparing WEBP to PNG gave distortions of 0 all around,
so something suspicious is going on.
AVIF’s colour space is YUV,
so perhaps conversion from RGB is necessarily lossless.
It was at this point I decided to try avifenc
instead,
comparing a conversion from PNG against the PNG itself,
since this tool only takes JPG or PNG as input.
$ avifenc -l -s 5 out-9.png out-avifenc-5.avif
...
Encoded successfully.
* Color AV1 total size: 56146081 bytes
* Alpha AV1 total size: 0 bytes
Wrote AVIF: out-avifenc-5.avif
$ magick compare -verbose -metric mae out-avifenc-5.avif out-9.png out-avif5-png9.png
...
Channel distortion: MAE
red: 0 (0)
green: 0 (0)
blue: 0 (0)
all: 0 (0)
So ImageMagick’s AVIF encoding was lossy, even with at quality 100! What’s more, it appears that with a true lossless encoding, the round-trip from RGB to YUV back to RGB is perfect, hence the 0 distortion. It’s also surprising that AVIF’s file size is so close to the original PNG’s. In any case, could then create true “lossless” AVIFs, and compare the various speed settings to the generated file size. This time, I’ve had the foresight to also measure elapsed time. I stopped at speed 3, since it was taking 9m 45.62s to encode.
Except Edge doesn’t support AVIF, but who cares about Edge? ↩
optipng
goes up to -o7
, but from the docs settings beyond -o5
seem unlikely to help.
See also: A guide to PNG optimization. ↩
For avifenc
, -s
is the speed setting, so 0
is the slowest and most effortful,
while 10
is the fastest and least effortful. ↩
On the Market–Frankford trains of SEPTA in Philadelphia, the designation signs indicating the line and the next stop are very simple 18-panel 3×5-segment displays.
But they aren’t merely square segments! Each square may be subdivided into subsegments of triangles or quarter circles to give the the letters some ✨character✨, and so that some letters can actually fit within the grid. Taking a closer look at the segments, we can see the subsegment outlines.
(The N and the K are two of my favourite letters in this set — I just think the design is clever.) Combining the outlines of a sample of letters, a single panel is subdivided as follows.
This suffices to display all of the letters, digits, and punctuation needed by the MFL,
which are all letters excluding J, Q, V, W, Z; all digits excluding 7^{1}; and the punctuation :
, -
, '
.
The letters are used mostly^{2} in the northeast Philly stations,
which are, in order towards the northwesternmost terminus:
You’ll notice a number of these are quite a bit longer than 18 characters in length. I was curious to see how they abbreviated those station names, so I took the train all the way to the terminus and took photos of each displayed sign.
York-Dauphin fits beautifully and fills all 18 panels;
poor Erie-Torresdale has a third of its name abbreviated away;
Spring Garden very nearly fits but does away with one letter (as does ALL STOPS - FRANKFRD
).
From these signs, I’ve recreated all of the letters, digits, and punctuation used^{3}, and also speculated on what J, Q, V, W, Z, 7 might look like. I’ve taken the J, V, W, Z from the San Francisco Muni Metro’s LRV4 designation signs, while I haven’t found a sample for Q or 7.
Personally, though, I would have made some of the digits a little more distinctive…
The stations of numbered streets are 69, 63, 60, 56, 52, 46, 40, 34, 30, 15, 13, 11, 8, 5, 2, and so feature all digits except 7. Because of how the existing stations are spaced, and because 27th doesn’t cross Market at all, it’s unlikely they’ll ever add a station with a 7. ↩
There’s one more station with a nonnumbered name: Millbourne, which is the second westernmost station, all the way at the other end. ↩
I
omitted because it’s identical to 1; yes, I know O
is also identical to 0, ↩