The problems here come from our experiments in cut-elimination (chapter 4 in [3)] on a higher-order formalization of the infinite pigeon hole principle. The main theorem proved is the statement that, given a function f:ℕ→{0,1}, for any natural number N, there exists a strictly monotonic function H:ℕ→ℕ such that f(H(I)) = f(H(0)) for 0 < I ≤ N. The lemma eliminated is a first-order instance of the infinite pigeon hole principle: given the function f:ℕ→{0,1}, at least one element S ∈ {0,1} has an unbounded number of occurrences, i.e. ∃S ∀N . f(N) = S ⊃ ∃K . N < K ∧ f(K) = S. By cut-elimination via resolution (CERESω [1]), we extract an expansion proof [2] of the theorem. The expansion proof contains the witness terms for the function H that are required to prove the theorem. In our research we investigate the shape of these terms in dependence of the formalization.

Most problems are essentially first-order, but need lambda lifting. In the general case, a type encoding would also be necessary - in this problem set, function constants outside of skolem terms are always fully applied and variables have always type ι.

The sections usually contain two problems, where the second one is a slight generalization of the other.


Clauses are written as sequents F1, ... ,Fn ⊢ G1, ... ,Gm where Fi are negative and Gj are positive literals. Constants start with a lower case letter, variables with an upper case letter. Types are first-order unless otherwise noted. Skolem functions start with 's', lambda lifted functions start with 'q'.


There are three independent problem fields:

These problems characterize the lemmas used in the formalization and are usually simpler than the actual theorem formalized. The refutation of the characteristic sequent set is used for our cut-elimination method CERESω.

The Clauses

  1. ⊢ 0 + N0 < (N1 + 1) + N0
  2. ⊢ N0 + (N1 + 1) = (N0 + N1) + 1
  3. ⊢ 0 + N0 = N0
  4. ⊢ N1 + N0 = N0 + N1
  5. ⊢ f((N0 + N1) + 1) = 0, f((N0 + N1) + 1) = 1
  6. f(A1) = 0, s5(q1) < s5(q1) + 1, s6(q1, s5(q1)) < A0, f(A0) = 0 ⊢
  7. f(A2) = 1, s21(q2) < s21(q2) + 1, s22(q2, s21(q2)) < A3, f(A3) = 1 ⊢
  8. f(A1) = 0, s6(q1, s5(q1)) < A0, f(A0) = 0 ⊢ N2 < N2 + 1
  9. f(A2) = 1, s22(q2, s21(q2)) < A3, f(A3) = 1 ⊢ N5 < N5 + 1
  10. f(A1) = S, s5(q(S)) < s5(q(S)) + 1, s6(q(S), s5(q(S))) < A0, f(A0) = S ⊢
  11. f(A1) = S, s6(q(S), s5(q(S))) < A0, f(A0) = S ⊢ N2 < N2 + 1

The type of q1 and q2 is ι>o, the type of q is ι>ι>o; the types the skolem functions s6, s22 are (ι>o)>ι>i and those of the skolem functions s5, s21 are (ι>o)>ι.

The term lifted by q is λS λX ∃H (∀I ∀J (I < X + 1 ∧ (J < X + 1 ∧ I < J) ⊃ H(I) < H(J)) ∧ ∀I (I < X + 1 ⊃ f(H(I)) = S)). The lifted term q1 is q(0) and q2 is q(1).

Clauses 1-4 are consequences of basic arithmetic axioms, clause 5 states the range of f as the set {0,1}. Clauses 6 and 7 are instances of clause 10, whereas clauses 8 and 9 are instances of clause 11.

The problem sets

The nTape2 clause set consists of clauses 1-9 and the nTape3 clause set consists of clauses 1-5,10 and 11. Since the clauses 6-9 are instantiations of clauses 10 and 11, the nTape3 clause set subsumes the nTape2 clause set. With lambda-lifting, this is not directly visible since the generalization of q1 and q2 to q is hidden by the constants we introduced.

nTape2 without lifting nTape3 without lifting
nTape2 with lifting nTape3 with lifting

Leo II can solve the lifted sets in less than a second, but fails with a timeout on the original ones.

The skolem functions in the nTape2 and nTape3 sets come from the skolemization of the induction axiom and hide the recursion necessary for showing an unbounded number of repetitions of a value of f. We replaced the induction by explicit applications of the induction steps and obtained clause sets for generating witness functions to enumerate at least 2, 3 or 4 repetitions.

nTape5 instance 2
nTape5 instance 3
nTape5 instance 4

Leo II fails on all instances; we presume that the recursion pattern H(s(H)) can not synthesize a finite function without support from the prover. But even without this pattern, synthesizing a finite function is problematic: no higher-order prover supports the TPTP $ite construct and an axiomatization has a similar effect as a cut-strong formula: taking the axioms X ⊃ ite(X,T,F) = T and ¬X ⊃ ite(X,T,F) = F represents the case distinction on an unrestricted formula X.

Still, given nothing else but this axiomatization and the claim that there exists a function H such that H(0)=1 and H(1)=0, even a highly ineffective search for a unifier should find the term ite(X=0, 1, 0). In general, we would like to find a nested ite term for any (reasonable) finite function. Therefore we submitted the following problem sets to the TPTP THF collection:

NUN023^1.p NUN023^2.p
NUN024^1.p NUN024^2.p
NUN025^1.p NUN025^2.p NUN025^3.p

The problem NUN23 claims the existence of a function H(0)=1 and H(1)=0. The problem NUN24 additionally requires H(2)=0 and problem NUN25 requires H(2)=1. The variation ^1 does not give any help, the variations ^2 and ^3 provide the witness term as an additional axiom. All higher-order provers manage to solve the problems including the witnesses. At the first execution of these experiments, only agsyHOL managed NUM23^1.p and no prover managed the NUM24^1.p and NUM25^1.p. According to TSTP, the current version 1.7 of LEO II also solves NUM23 and the current version of agsyHOL manages NUM24. NUM25, the first problem requiring a nested if-then-else is still unsolved by all provers. A copy of the files (before classification through Geoff Sutcliffe) can also be obtained here.

Expansion proofs can be seen as a higher-order version of Herbrand disjunctions. Their structure mirrors the term structure of the input formula but saves the witnesses needed for weak quantifiers (those which become existential when the formula is brought into prenex normal form). The deep formula of an expansion proof replaces weak quantifiers by disjunctions of instances with the witnesses necessary to prove it. This also means that the deep formula is essentially propositional, because all quantifier instantiations are already provided. In our particular case, we are in the fragment of uninterpreted functions with first-order equality. Even though the deep formulas of our extracted expansion trees are provably valid, they can be checked.

nTape2 Deep Formula (not lifted) nTape3 Deep Formula (not lifted)
nTape2 Deep Formula (lifted) nTape3 Deep Formula (lifted)

All classical higher-order provers (Leo II, Satallax, agsyHOL) fail on these problems. When given a timeout of 60 seconds, Isabelle 2016 manages to check all files via tptp_isabelle by exporting to Z3.

  1. Stefan Hetzl, Alexander Leitsch, and Daniel Weller. CERES in higher-order logic. Annals of Pure and Applied Logic, 162(12):1001–1034, 2011
  2. Dale A. Miller. A compact representation of proofs. Studia Logica, 46(4):347–370, 1987
  3. Martin Riener. Applications of Higher-Order Cut-Elimination. unpublished [download draft]