Tristan's paper on the Collatz problem and regular expressions will appear at the 14th International Conference on Reachability Problems, 2020. Here's the idea:

The Collatz problem is simple to describe: Pick a number. If your number is even then divide it by two, if it is odd then multiply by 3 and add 1. And repeat! For example, if we start with 10, which is even, we then get 5. Which is odd, so we get 16. Then 8 then 4 then 2 then 1 then 4 then 2 then 1 ... we are in a loop! The Collatz conjecture says that no matter which natural number \( 1,2,3,\ldots \) you start with you will eventually end at 1. It has been shown to hold for all natural numbers up to \(10^{20}\) (roughly) by computer simulation, but a proof that it holds for any natural number remains elusive, and the problem seems notoriously hard to solve.

It's easy to see that all powers of 2 lead to 1. Lets say we start with \(64 = 2^6 \), then we see that at each step we get an even number, and dividing by 2 amounts to subtracting 1 from the exponent: $$64 \;\rightarrow\; 32 \;\rightarrow\; 16 \;\rightarrow\; 8 \;\rightarrow\; 4 \;\rightarrow\; 2 \;\rightarrow\; 1 $$ $$2^6 \,\rightarrow\, 2^5 \,\rightarrow\, 2^4 \,\rightarrow \, 2^3 \rightarrow\, 2^2 \,\rightarrow\, 2^1 \,\rightarrow\, 2^0$$

So where do Tristan's regular expressions come in? Lets look at the two parts of the Collatz map:

  • if \(x\) is even we compute \(x/2 = T_{0}(x)\),
  • else if \(x\) is odd we compute \((3x+1) / 2 = T_{1}(x)\)

(note that if \(x\) is odd, then \( 3x+1 \) is always even, hence it's totally OK for \( T_{1} \) to do two steps at a time; this formulation will give the neatest result).

So with zero uses of the \(T_1\) map, we already know that the infinite set of numbers that are powers of 2 all go to 1. Written in binary, the powers of 2 are \(1,10,100,1000,1000,\ldots \), a pattern described by the simple regular expression: \( 1(0)^\ast\).

What happens if we allow only exactly one use of the \(T_1 \) map, and any number of the \( T_0\) map? Hmmm, we get those odd numbers such that when we apply \(3x+1 \), they give a power of 2. We also get any even number that when divided by 2 gives an odd number such that when we apply \(3x+1 \) get a power of 2. Or any even number, than when divided by 2 gives an even number that when divided by 2, gives an even... that gives an odd number that gives a power of two. That (rather wordy) description can also be captured in a regular expression! In fact, for each \(k \in \{ 1,2,3,\ldots \} \) there is a regular expression that describes the set of binary numbers that go to \(1\) using \(k \) applications of the \(T_1\) map, and any number of the \(T_0\) map. Shallit and Wilson [EATCS 1992] showed this, and although their method is straightforward to understand it gives ginormous regular expressions, of size doubly-exponential in \(k \).

Tristan shows that regular expressions that are (merely!) of size exponential in \(k \) do the job just fine. In summary, his main result is that for each \( x,k\in \{1,2,3,\ldots \} \) there is a regular expression, of size exponential in \(k\), that characterises the binary representations of all \( y\in \{ 1,2,3,\ldots \} \) that lead to \(x\) via \(k\) applications of \(T_1\) and any number of \(T_0\). Caveat emptor: I say "merely", but the regular expression for \(k=4,x=1\) (given in Appendix D) spans 3.5 pages of text!

The paper generalises (to any \(x \geq 1\), not just \(x = 1\)) and improves upon previous work in the field, and helps us to zoom into the inner workings of Collatz. In particular, the paper suggests that by thinking in base 2, or even other bases, we might notice something structure within the seemingly mysterious Collatz process? It turns out the answer to that (rhetorical-sounding!) question is yes!