In my last post I introduced the Computational Theory of Laws (CTL) as a successor to the Best System Analysis (BSA) of Mill, Ramsey and Lewis. In this post I want to briefly reprise that theory in a less mathematically technical way and put it to philosophical work.
The Best System Analysis
The Best System Analysis says that the Laws of Nature are the general statements that figure in the best deductive axiomatization of all the truths about the world.
The theory assumes that there is some set of sentences that express the complete truth about the world. (The relevant standard of "completeness" being one of the questions we shall discuss below.) Any set of sentences is guaranteed to be deducible from some set of "axioms" -- if only that set of sentences itself. But Best System theorists expect that insofar as the world displays regularity-- and this regularity is reflected in its description-- then its complete description can be deduced from smaller sets of axioms which capture those regularities by generalizing over them. Many axiom sets might do this but some will be, in some sense, simpler than others and candidate axiom sets may differ in strength -- that is, in the number of truths that can be deduced from them. The demands of simplicity and strength may compete. The "Best System", Lewis said, is the axiomatization that achieves "the best balance of simplicity and strength". 1
The central problem for the BSA is the want of a clear account of "simplicity" and "balance". If simplicity is in the eye of the beholder, then what systems are best and what generalizations are laws will turn out to be a subjective matter. Alternatively, if simplicity is just a matter of the shortness of sentences, then what axioms are simplest will depend upon what language we happen to speak. In either case what makes the best system "best" does not seem to be an objective feature of the world. Then, too, however we measure simplicity, how are we to "balance" it against strength? If, as BSA theorists sometimes suggest, this balancing is a matter of aesthetics, then this too seems to subvert the objectivity of laws.
The Best Program Theory
CTL can be thought of as a version of the Best Systems approach which replaces Euclid’s deductive conception of “system” with Turing’s. The Computational theory says that the laws of nature are those truths which are constant given the optimal source coding of all the truths about the world. Inevitably, I expect, the Computational Theory will come to be called "The Best Program Theory, so be it. 2
The theory begins with the observation that the set of sentences that completely describes the world must be generable as the output of some computer program or other --if only the program "print the following sentences...". However if the world is orderly or systematic and this orderliness is reflected in its description, then there must be programs that generate those sentences that are simpler than that set of sentences themselves. The Best Program is the simplest program that captures all that orderliness.
The optimal encoding describes the world algorithmically. Its programmatic part expresses a function which captures the systematic, orderly features of our world. Given different inputs, that program might describe different possible worlds. Each of these will be a nomologically possible world with respect to our world; it will be a world that shares the same system as our own. Any proposition true at every nomologically possible world is a law of nature.
The central merit of the Computational Theory is that, thanks to the advent of Algorithmic Information Theory, it is possible to say precisely what "systematic " and "simple" mean here.
By "orderliness" or “systematicity” I mean "computability" in the sense of Turing and Church. The notion of "order" so understood is broader than that of “regularity”. Regularity makes for order, but there are many more ways of being orderly than just being regular. Infinitely many: each member of the class of computable functions expresses a different way of being orderly.
By “simplicity” I mean Algorithmic Complexity.
The Objectivity of Complexity
The algorithmic or Kolmogorov complexity of any string of symbols is measured by the length of the shortest computer program that can generate that string. That length will obviously depend on what language the program is written in but the foundational discovery of Algorithmic Complexity Theory is that, whatever programing language we choose, the complexity of a string is always a function of an invariant underlying magnitude which doesn’t depend on the string itself. 3
The significance of this invariance is profound. Turing had shown how a physical object-- a computer running a program-- could embody a mathematical abstraction: a function. Church and Turing taught that every computable function could be embodied by some program running on a universal computer. The founders of complexity theory realized that this association allows us, in turn, to associate a scalar magnitude – a program’s physical length—with every computable function. Relative to that association we can speak of an functions complexity as a simple measure of the binary length of the program that embodies it. In turn, we can speak of the complexity of the output of any algorithm as the sum of the length of a program and the length of the input for which the program produces that output.
Of course, many different programs can embody the same function so it might seem an arbitrary matter which we choose to measure the algorithm’s size. And so it is. The significance of the Invariance Theorem is that it shows that this arbitrariness makes no difference. Whatever program or programming language we choose, our measures will always converge. They are just different perspectives from which we can view the same, invariant abstract quantity – Algorithmic Complexity.
Here it is useful to compare complexity with velocity.
It only makes sense to speak of any object as having a velocity relative to some frame of reference. The same object can be assigned different velocities relative to different frames of reference. There are infinitely many frames and none of these seem privileged so our choice of which is to adopt is arbitrary.
Nevertheless, we think that velocity is an objective feature of objects in the world. Why so? The answer comprises at least four parts.
First: there is a formally specifiable set of acceptable frames of reference--inertial frames of reference-- relative to which objects always obey the same kinematic laws.
Second: the choice of inertial frame of reference fixes our description of the velocity of every object. I can decide if I want to describe a pair of objects as at rest or in motion by picking different frames of reference, but if they are not in motion relative to each other I can’t give them different velocities with any choice of inertial frame.
Third: the description of the world relative to one inertial frame is always systematically transformable into a consistent description in any other inertial frame. Thus there is no contradiction in saying that, with respect to one inertial frame, an object is at rest but with respect to another it is at motion. We can do a Galilean transform on the two co-ordinate systems and see that these are in fact equivalent claims.
Fourth: The relevant transformations require us to think of velocity as a real property of objects and inertial frames. I may say that an object is at rest and you may say it is in motion and we may both be right if your frame of reference has a velocity relative to mine. In that case it must be that the sum of the velocity you ascribe to the object and the velocity I ascribe to your frame of reference equals the velocity I ascribe to the object given my frame of reference (and vice versa from your point of view). Whether the velocity is located in the frame or the object, the underlying magnitude reflected by their vector sum cannot be made to go away.
All of which is to the point that velocity-- while it is a relational and not an intrinsic property-- is not a subjective property of objects.
And so it is with Algorithmic Complexity. The numeric measure of the complexity of any string of data-- the length in bits of the shortest combination of program and input that generates it-- is always relative to specific programing language. But:
First: there is a formally specifiable set of computers and computer languages according to which the complexity of strings always obey the same formal principles: universal Turing machines and universal languages (i.e. languages in which it is possible to write an emulation of a universal Turing machine).
Second: While a given string may be assigned different complexities by different languages, the relative complexity of any two strings in any two languages can only vary only within a certain limit. That limit is a constant value which is dependent on the two languages and not on the strings themselves.
Third: Because any universal language can be programmatically translated (compiled) into any other, the complexity of any string as measured in any language can be both calculated by and compared with its complexity in every other language.
Fourth: Because computer languages are themselves just a kind of program, we can measure the complexity of languages just as we do the programs written in them. And while the complexity of any given output can vary when measured by different languages, it can do so only in inverse proportion to the complexity of the languages themselves. Thus, imagine a particular output which requires a very complicated program in a given language. It will always be possible to devise a new language in which that same output can be generated by a very short program. That means the output is complex with respect to the first language and simple with respect to the second. And yet the missing complexity has not disappeared. It must show up, in at least the missing quantity, when we measure the complexity of the second language by translating it into the first. Whether the complexity is apportioned to the output or to the language which generates it, the underlying aggregate complexity cannot be made to go away.
All of which is to say that, like velocity, Algorithmic complexity, though a relational property, is nevertheless an objective one.
The relativity of complexity combined with CTL implies an inherent relativity in the Laws of Nature. Accepting this nomological relativity has, I think, deep consequences for the way we think about Laws and much else. I will come back (briefly) to this point later on but will not elaborate it here. Here we are after philosophical low hanging fruit…
Not Necessarily So
So far I have been stressing the continuities between the Computational Theory and the Best Systems approach but CTL makes for significant departures from the way philosophers have traditionally thought about laws and the logic of nomological necessity. Most significantly: CTL entails that the accessibility relation for nomological necessity is neither symmetrical or transitive.
In less technical terms: it means that a world may be nomologically possible with respect to our world and yet not share our world’s laws.
This follows from the uncontroversial fact that different functions—different in their domains and ranges—may nevertheless the have the same values; may even have the same value for some of the same arguments. For programs, this means that different programs—different in what function they express-- may nevertheless give the same outputs, even the same outputs for some of the same inputs.
So consider the simple program Alpha(i):
function Alpha(integer i){ 4
Do{
Print i
}Loop
Given any integer as input, Alpha(i) will print it out, over and over, ad infinitum. Thus, Alpha(1) will output:
1 1 1 1 1 1 1…….
Compare Alpha(i) with Beta(i):
function Beta(integer i){ 5
Do{
Print i
i = i*i
}Loop
}
Given ‘1’ as its input, Beta(i) , like Alpha(i), will also print a unending string of ‘1’s. And yet the programs give very different results when given different inputs. Thus, Alpha(2) will boringly produce:
2 2 2 2 2 ….
Whereas Beta(2) will yield an infinite accession of squares
2 4 8 16 256 …
Trivial though they are, there are possible worlds for which these simple algorithms are pretty much the whole story: Turing worlds.
Turing worlds are whole universes which comprise nothing more than a black box with an input and an output tape. A world for which Alpha(i) was the optimal source coding might look like Wα.
Wα
Wα is described by both Alpha(1) and Beta(1), but Beta(1) isn’t the optimal encoding for Wα because Beta(i) is more computationally complex than Alpha(i).
On the other hand Beta(i) might be the optimal encoding for a world that looks like Wβ.
Wβ
Both Wα and Wβ exhibit regularities in the philosopher’s sense. In Wα every output is the same as the last. In wβ every output is the square of the last. Presumably these generalizations would get treated as laws of their respective worlds according to BSA. CTL would agree: any world described by Alpha(i) must conform to Lα:
(Lα) Every output is the same as the previous output.
Every world described by Beta(i) must obey Lβ.
(Lβ) Every output is the square of the previous output.
But, even though Lβ is true at Wα and even though Wα is perfectly described by Beta(1), Lβ is not a law of Wα because it is not true in all the worlds described by the function that optimally describes Wα, viz.Alpha(i).
The fact that Beta(1) could output Wα means that Wα is a nomologically possible world with respect to the laws of Wβ. Wα is nomologically possible with respect to Wβ even though Wα does not have the same laws as Wβ.
Moreover, while Wα is nomologically possible relative to Wβ, the converse does not hold. Wβ is not nomologically possible relative to the laws of Wα . Give 2 as input Alpha(i) yields:
Wα’
Wα’ is nomologically possible relative to Wα but not Wβ even though Wα is possible relative to Wβ.
These failures of reflexivity and transitivity mean that the famous modal principles
(S4) □P → □□P
(S5) ◊P→ □◊P
do not hold for nomological necessity.
Our simple worlds also illustrate why not every true generalization expresses a nomological regularity. At Wα it is true that:
(G1) Every output is odd.
and at Wβ;
(G2) Every output is even.
But neither generalization holds true at every world which is nomologically possible with respect to these worlds.
These results for the logic of laws must, in turn, affect the way we think about counterfactuals. While there are controversies about the semantics of counterfactuals, everyone agrees that we decide if A > C is true at a world w by finding world which is similar to w at which A is true and seeing if C must true there assuming the laws of w.
Accordingly if we ask, what would have happened at Wβ if there was a ‘1’, instead of a ‘2’ on the input tape, the answer is a world that looks like Wα . And yet, if we ask at Wα , what would have happened if its initial conditions had been a ‘2’, the answer is a world that looks like Wα ’ not Wβ. Counterfactuals true at our world need not be true at nomologically possible worlds.
Moreover, and very significantly, counterfactuals may lack any truth value at some worlds.
What would have happened in Wβ if the input had been the letter “A” instead of the numeral “1”? There is no answer because the square function is not defined over letters. The order we find in an actual world may be best described by an algorithm which is not defined over every possible input and when that is so the laws of that world may be silent on the question “What would happen if…”.
Humean Supervenience
These upshots for the logic of laws must in turn affect how we evaluate philosophical arguments about them. For several decades now most of the arguing has been about the doctrine David Lewis called "Humean Supervenience".
“Humean supervenience is named in honor of the great denier of necessary connections. It is the doctrine that all there is to the world is a vast mosaic of local matters of particular fact, just one little thing then another… We have a geometry: a system of external relations of spatio-temporal distances between points… And at those points we have local qualities: perfectly natural intrinsic properties which need nothing bigger than a point at which to be instantiated. For short: we have an arrangement of qualities And that is all . There is no difference without difference in the arrangement of qualities. All else supervenes on that.” 6
Not all Humeans agree with Lewis' account of "all there is" nor do they agree that mere supervenience is a strong enough relation. Ned Hall suggests:
“...the Humean reductionist position ought to be the following: The fundamental ontological structure of the world is given by the distribution of perfectly natural magnitudes in it... . All other facts, including facts about the laws, reduce to these facts. Applied to.. [a] Newtonian particle world, the thesis is that the facts about its laws reduce to the facts, over all time, about the positions, masses, and charges of the particles.” 7
These differences need not concern us here. CTL is as congenial to any variety of Humeanism as BSA. Both treat the facts about laws as emergent from the formal properties of a description of the world that is somehow complete. Whether or not "completeness" means describing "all the local matters of particular fact", or "the distribution of magnitudes",... or some other accounting of "the fundamental ontological structure of the world" is a separate matter.
What CTL brings to the table is a new way to respond to the central anti-Humean argument:
The Humean is committed to saying that no two possible worlds alike in all matters of Humean fact can be modally different. Worlds with the same distribution of "natural magnitudes", or "natural intrinsic properties" must have the same laws. To defeat this claim all the anti-Humean has to do is to describe two worlds which are alike in the relevant Humean respects but have different laws. This has seemed easy to do.
There are surely possible worlds at which it is a law that all particles move with a constant speed. Imagine then a world WC that obeys this law and which happens to contain only a single particle travelling at 1 meter per second.
There are also possible worlds which are governed by Newton’s laws. Newton’s laws do not require particles to travel at constant speed. Indeed, they entail that they must sometimes change speed when they collide with other particles. But Newton's laws also do not require that the universe contain any specific number of particles. In a Newtonian world which started off with a single particle travelling at a constant speed of 1 meter per second, that particle would travel at that speed forevermore. Call this world WN.
WC and WN are indistinguishable in all Humean respects and yet they have different laws and different counterfactuals are true at them. Supervenience fails!
The problem with this argument from the point of view of CTL is there is no WN. That is, there is no world that looks exactly like WC but is governed by Newton’s laws. The laws of a world depend on its simplest description. You don't need anything as complicated as Newton's laws to describe a world like WC for the same reasons you don't need a supercomputer to animate a game of Pong™.
But what about counterfactuals? What would happen if the particle at WC collided with a more massive particle? Would it change speed or not?
The answer is that there isn't an determinate answer to that question for the same reason there is no determinate answer to the question "What would happen if two Pong “balls” collided?" Many things might happen. But none of these is ruled in or out by the algorithm that defines the game, or the world.
Notice that the point I am making here is very different from the argument sometimes made by Humeans that the world of this example is just "too simple" to have laws, or maybe that they are "too simple" for our "intuitions" to get a grip. To the contrary, the simplicity of this world is precisely why the counter-example fails.
And the point cannot be evaded just by making the story more complicated. For example, John Carroll tells this elaborate anti-Humean tale.
“Consider a possible world, U1. In it, there are exactly five X-particles, five Y-fields, and not much else. Since the beginning of time, the X-particles have been traveling in a line at a constant velocity towards a staggered string of Y-Fields. Particle b, the first X-particle to enter a Y-field, does so, say, at high noon. Then, each hour on the hour, for the next four hours, another X-particle enters another Y-field. When their time comes, the particles all pass through their respective fields quite quickly, without any change in direction, never to enter a Y-field again. While in their Y-fields, all the X-particles have spin up. Unlike any of the other particles, particle b has in interesting mirror right along, though not in, its path to the Y-field. This mirror is on a well-oiled swivel and so can be easily be twisted between two positions, positions c and d. It is fact in position c and so does not interfere with b's flight. If twisted to position, d however, the mirror would deflect b out and away from all the fields. Clearly, the generalization, L1, that all X particles subject to a Y-field have spin up could be a law about such a world." 8
Next we are to imagine two more words: a world U2, in which particle b doesn't spin-up when it enters a Y-field and a world U3 where the mirror is turned to deflect particle b to keep it from ever entering a Y field.
The crux of of the story is to ask what would have happened at U3 if the mirror had not deflected particle b? Carroll thinks there two different possibilities depending upon whether or not we imagine U3 to be governed by L1. If it is, then we will say that at U3, b would have spun up if it the mirror hadn't been turned. But if U3 isn't governed by L1 then b might have behaved as it does in U2. Since either seems possible it seems we must say that U3 really depicts two different worlds that are alike in all respects except what laws and counterfactuals are true at them; worlds modally different but alike in all matters of particular fact . Hence, supervenience fails. 9
Complex imaginings like these lead some to despair about the fruitfulness of thought experiments or the "limits of intuition". But for the computationally minded metaphysician thought experiments are not "armchair philosophy" but simulations run in wetware. When the results are inconclusive we just need to write clearer code.
Suppose, then,that instead of just imagining Carroll's worlds we consider what would be required to simulate them.
Taking all the other details of Carroll’s story for granted, the crucial differences in the story are what happens when a particle collides with a field. In U1 the relevant algorithm is surely something that gives every particle an up-spin when it enters any Y- field.
function Collides1(particle i,field j){ 10
if (j.type = Y) {
i.spin = UP
}
}
What could be simpler? And because any world with collisions governed by this algorithm will conform to L1 we should agree with Carroll that L1 is a law of U1.
U2 clearly doesn't obey L2. Moreover U2 is more computationally complex than U1. In U1 all particles behave the same way. But to fully describe U2 we must distinguish one particle from the rest. We need something like this:
function Collides2(particle i,field j){ 11
if (j.type = Y){
if (i ≠ b){
i.spin = UP
}
}
}
Collides2 is a more complex program than Collides1. This reflects the fact that, as described, U2 is a more complicated world that U1. More complicated because the story of what happens in U2 particle-field collisions is more complicated.
What then of U3? In U3, b never collides with a Y-field but every particle that does gets an up spin. This means that both Collides1 and Collides2 describe what happens in all the collisions that happen in U3. Given U3’s initial conditions, programs that differed only in containing one of these functions in place of the other would give the same output. But Collides1 is still a simpler program and thus the more optimal description of U3. We should conclude then that U3 has the same laws as U1, including L1. U3 is a nomologically possible world with respect to the laws of U2 (whatever they might be) but they are not the laws of U3.
Carol claims that all he needs to make his argument succeed is the modal principle he labels SC:
(SC) If ◊P and □(P ⊃ Q), then P > Q
In other words: if P is physically possible and it is a law that if P is true then Q is true then if P were the case then Q would be the case. Carroll characterizes this as “our conceptual bridge from lawhood to the subjunctive conditional.” [p.182].
From this plausible principle, which I do not dispute, he supposes he can infer:
(SC*) ◊P and □Q then P >□Q
In other words: if P is physically possible and Q is a law then if P were true then Q would still be law.
(SC*) would indeed make Carroll’s and other anti-supervenience arguments valid. It would entitle us to say, for example, that if it is nomologically possible according to Newton’s Laws that there be only one particle then there must be a world like WN which has only one particle but at which all of Newton’s laws still hold.
Carroll’s problem is how to get get from (SC) to (SC*). He says:
To derive (SC*) from (SC) suppose that P is physically possible, and also suppose that Q is a law. Since Q is a law, it is a law in every possible world with same laws as the actual world, and so it is physically neccessary that Q is a Law. 12
But the central step here is a non sequitur. Of course it is true that if Q is a law in the actual world then it is a law “in every world that has the same laws as the actual world”. But that tautology doesn’t give him what he needs to show, which is that if Q is a law in the actual world then a world where a nomologically possible P is true is one of those worlds “with the same laws as the actual world”.
What Carroll needs to establish to get SC* is the claim that “Since Q is a law … it is physically necessary that Q is a law”. That is:
(S4) □P → □□P
Perhaps sensing that he hasn’t proven S4, Carroll immediately throws in another argument:
“Also every proposition physically necessitates a physically necessary one. So, P physically necessitates that Q is a law. Since P physically necessitates that Q is a law and part of our initial supposition is that P is physically possible, it follows from (SC) that if P were the case, then Q would be a law.” 13
The assumption here is that:
□Q → □(P ⊃ □Q)
The idea being that the truth of the consequent, □Q, at every nomologically possible world will make the material conditional P ⊃ □Q true at every nomologically possible world. But that already assumes that □Q must be true at every nomologically possible world and that, once again, assumes (S4).
CTL tells us that (S4) is not true. The laws of the actual world need not be laws at nomologically possible worlds. A world with only one particle is nomologically possible according to Newton’s Laws, but Newton’s principles are not laws at worlds with only one particle.
Nomological Relativity
I noted above that because measures of algorithmic simplicity are – in effect—relativized to the programing language that we use to measure them, it is possible, in principle, for equally sound measures to assign different orderings of complexity to the same sets of data. I expect then to hear the objection:
Your arguments assume a particular computer language. Sure, in the kind of computer languages we ordinarily use, encoding Newton’s laws takes a longer program than just saying “Velocity is constant”. But couldn’t there, in principle, be other computer languages where this difference was reversed? Relative to a language like that mightn’t it turn out that Newton’s laws were the simpler descriptions of single particle worlds?
I expect these complaints about the relativity of algorithmic simplicity will come most stridently from philosophers who, in other contexts are quite happy to appeal to their “intuitions” or to wave their their hands when they talk of Lewis’ “simplicity” and “balance”.
While I do think Nomological Relativity is significant, it is not relevant to the issue at hand.
Different measures of complexity will express different ways of parsing the order and disorder present in the data. Ultimately these will reflect different orderings of the set of computable functions. But while different orderings are possible, it is also possible to guarantee that they are well ordered. This is to say that in the programming languages in which algorithmic complexity is most precisely measured, there is always a shortest program. There are no ties. And ties—as between WN and WC-- are what the anti-supervenience arguments require.
My thanks to Kadri Vihvelin, Tom Sterkenburg and especially Steve Petersen for helping me sort out my views on these matters. They may not, of course, share these views.
Comments elsewhere