6 tiles are all you need for a PhD! Congrats to Tristan!
I'm incredibly happy and proud to say congratulations to (soon to be) Dr Tristan Stérin! His PhD thesis is titled Six Tiles: From Collatz Sequences to Algorithmic DNA Origami, which he pictorially summarised as:
The start of a journey
We've had an amazing journey together, that began when I met Tristan at ENS Lyon in January 2017. I gave a week of lectures about theory and experiment of algorithmic self-assembly (including some live experimental demos in the lecture theatre!), intrinsic university and noncooperative self-assembly. During one of the first lectures I threw up an open problem: is there a 64-counter in the 6-bit Iterated Boolean circuit model? i.e. a structure that iterates through 64 distinct 6-bit strings before returning to the first, and repeating. Our experimental paper has a 63-counter, but my co-authors and I couldn't find a 64-counter. Were we not clever enough?
A 63-counter, assembled from left to right. Figure from Stérin and Woods, AFM image from Woods, Doty et al.
Within 4 nanoseconds of Tristan hearing the question he put pen to paper, spending the next hours, days and weeks solving the problem, essentially ignoring the rest of my lectures (tsk tsk!). He soon moved to Inria, Paris, where I was based at the time. His negative answer won the best poster award at the DNA23 conference: alas, there are no 64-counters. The best one could do was a 63-counter, such as the one found by Erik Winfree using computer search (see Figure 4e). Nor are there any \(2^n\) counters for \(n\)-bit IBCs, but there are always \(2^n-1\) counters!
Chapter 1: Collatz and the 6 tiles
In 2018 we moved to our new home, the Hamilton Institute, at Maynooth University, Ireland, where Tristan started his PhD. He decided to focus on something easy, ahem: The Collatz problem! Pick a number \(x\), if \(x\) is even divide by 2, if \(x\) is odd compute \(3x+1\). Repeat. Do you always eventually get to 1? Try an example, I bet my house that you will. Here's a nice picture of some numbers going to 1 (credit xkcd):
Why would Tristan start his PhD on such a notoriously hard open problem? One motivating question was whether there was computation hidden deep inside these Collatz iterations. John Conway had shown that more general functions of this form, with lots of affine maps, simulate Turing machines. Could Collatz be hiding some, perhaps simpler, computations? If so, maybe finding computational hardness buried in Collatz could either help explain the difficulty of predicting its iterations or perhaps even give insight to the structure of those iterations. That was one idea.
Tristan's thesis (Chapter 1) considers Collatz iterations as operating in the 2D plane: the digit expansion of a natural number is written down horizontally in base 2, or vertically in base 3, and each time we apply Collatz we write down a new number below it (base 2) or to its left (base 3). One neat result is that if you write down a number \(x\) in base 3 (vertically), let Collatz iterate on it for a little while, et volià, you get the same number \(x\) but transformed into base 2 written horizontally. In 2D diagrams of iterations, Collatz is converting from base 3 to 2:
Base conversion hidden in Collatz iterations.
Base conversion is not the hardest problem in the world, but, try it and you'll see it's not completely trivial either. We computer scientists have somewhat precise ways to talk about how "hard" a problem is, primarily through complexity classes with unfortunately unintelligible names: base conversion lies in the complexity class NC1 but is strictly harder than problems in AC0 [1]. So that tells us, yes, amoung the many crazy things that Colltaz might do, one of those is base conversion, which is something sensible and we know pretty accurately just how hard it is. Hence we know that predicting Colltaz many steps into the future is at least as computationally hard as doing base conversion. Of course for all we know, Collatz could be much, much harder. I wonder if base conversion is an isolated result, or if we can perhaps leverage that result to find other cute and predictable algorithms buried in Collatz iterations? Have we found a chink in the armour of Collatz? Maybe, maybe not, but we have a formalism in hand, and there are clear research directions to follow here.
The base conversion result is actually a special case of a much more general set of definitions and results that Tristan explored. Collatz-produced digit expansions are 'tiled' in the 2D plane using a small set of just 6 tiles (found by Matthew Cook, another person obsessed with Collatz). There are some wonderful ideas here, for example by walking along the tilings one computes numerical functions, and if you fix a start point and an end point, no matter which path you follow you compute the same function. There is beautiful structure in Collatz tilings. Jump into Tristan's recent post to learn about it.
Left: the 6 Collatz tiles. Each tile-edge defines a function. Centre: A small tiling by the 6 tiles. Functions are composed by walking along the edges. No matter which path taken from A to B, that path always computes the same function. Right: Base conversion, using the 6 tiles (base 3 vertically on the right side as 23 = [[212]]_3, base 2 horizontally on the bottom as 23 = [[10111]]_2). Curiously, the brown path has the same number (23) written in base 6 (23 = [[035]]_6), where we use the tile names instead of edges. Other paths from A to B can be described using a mixture of bases.
Chapter 2: Busy beaver problem: BB(15) is harder than Erdős' conjecture on powers of two
It turns out the 6 tiles have a deep connection to an open problem by Erdös: Do all powers of 2, greater than \(2^8\), have a digit 2 when written in base 3? There's bases 2 and 3 again. The tiles can be used to generate a picture of all the powers of 2, in base 3:
The first 75 powers of two assembled in base 3 by the size-6 Wang tile set introduced by Cook, Stérin and Woods [2021] (see blog post). Reading left-to-right, each column of glues (colours) corresponds to a power of two: beige glues represent ternary digit 0, green glues 1 and red glues 2. For instance, the rightmost column encodes (from top to bottom) 1102100210202020112020120000201120010210212220223 = 275. The complexity of the pattern illustrates the complexity of answering Erdős' conjecture which amounts to asking if each glue-column (except for the few first) has at least one red glue.
We leveraged the tiles to find small Turing machines that start on an empty tape, iterate through the powers of 2 in base 3, checking for counterexamples to Erdos' conjecture. If they ever find such a counterexample (i.e. $n>8$ such that $2^n$ has a 2 in base 3), it halts. One of the machines has 15 state and 2 symbols. The number of steps until that machine halts gives a lower bound on BB(15), the number of steps for any halting, input-less, 15-state 2-symbol Turing machine to halt. This in turn gives the smallest Busy Beaver value (BB(15)) that relates to a natural open problem in mathematics. Our post has more.
Chapter 3: Computing while walking through a maze
Another way to think about the kinds of computations hiding in the 6 tiles is to genearlise the model a little and see what happens. If we allow for the plane to have some prebuilt 'seed' structures it turns out the 6 tiles simulate Boolean circuits. Not only that, but the same result is achievable with only 4 tiles. We call this model the Maze-Walking Tile Assembly Model as disconnected seeds resemble mazes that tiles may flow through, here's a quick video (believe us, tiles are computing as they flow!):
The process of tile-assembly in the Maze-Walking TAM with the 4-tile NAND-NXOR tile set starting from a disconnected seed structure. Tiles attach asynchronously to the structure until it is completed. Tristan's blog post has more details, and a link to our paper.
Chapter 4: Thermodynamically favoured computing with DNA tiles
After all that what was left to do? Well, how about building the 6 tiles, or any tiles, out of DNA and watch them self-assemble for real? In collaboration with Abeer (a postdoc in our group), Tristan and started what I think will be a new adventure. We've barely left the Shire, but Abeer and Tristan have already implemented a simple DNA program that does multiplication by 3, in base 2, and another that divides by 2, in base 3 ... two operations intimately related to Colaltz. Of course! This topic will need a post of its own and I'm looking forward to writing a lot more about that.
The end?
Tristan submitted his PhD thesis in 2022, and had his Viva-Voce (PhD examination) on June 28th, 2023, the examiners were Cris Moore and Dave Doty, and I watched on with pride, marveling at how much he's taught me over the past 6 years.
What's next for Tristan? He co-founded prgm.dev, a company that carries out scientific research, lab automation, and more. Stay tuned.
Footnotes
For brevity, I've skipped other cool stuff Tristan did: [2], [3], [4].
[1] This means we can solve it quickly, in polynomial time on a sequential computer (i.e. in P), and in logarithmic time on a highly parallel computer (i.e. in NC1), but that it is strictly harder than (say) the problem of addition (i.e. addition is in the function version of AC0). See this blog post for more details on the base conversion result (paper published in RP2020). .
[2] Tristan's publications can be found on our group publications page, or his page. Besides the papers on topics mentioned above [BB(15) and Erdos problem (arxiv), Maze walking tile assembly (DNA27), Collatz process embeds a base conversion algorithm (RP 2020)] there is his RP2020 paper that used regular expressions to describe the 'ancestors' of numbers under Collatz iterations, and a DNA26 paper on scadnano (software for DNA design).
[3] Tristan's really nice blog posts on the mathematics underlying free energy, on Shannon entropy, and other topics.
[4] bbchallenge: a collective research effort to solve BB(5): the Busy Beaver problem for 5-state 2-symbol Turing Machines.