Commits as Experiments

On Earning Our Place in the Pantheon of the Sciences

Created: 2026-01-27·Updated: 2026-01-27·Tags: ideas practices

Disclaimer. This is only an idea that I’ve had, not something I’ve built yet. I am interested in building this in the future, however, likely on top of the abyss.

On Superstition and Guesswork

“Nullius in verba.”
(Take nobody’s word for it.)
Motto of the Royal Society.

It is standard practice for one to compose a Git commit message to communicate intent. The idea is that, what changes are being made can be trivially deduced by reading the diff – explaining that in the commit message would be a mere regurgitation of the obvious. The commit message should, therefore, explain what could not otherwise be explained by the mechanistic artifacts, namely intent. We may see what changes you have made, yes, but why have you done so? In other words: What goal is it that you seek to achieve with this?

Of course, this is not enforced by any sort of mechanism, it’s simply good practice. This much is plain for anyone to see. What is not as obvious, however, is the lack of rigour in this practice, and consequently, the lack of provenance.

A mere git commit -m "(perf) a bunch of optimisations" does not communicate what the impact is on the resulting code as a whole. This is not only about what exact components are being targeted by the change, what metric it is seeking to change, and what the actual impact is, but also what the impact is on the rest of the codebase. Sure, this may have improved, say, the performance of one component, but does it come at the detriment of anything else? That is, it does not provide the complete and reliable picture of the trade-off being made with the given change.

This brings us to the next point, which is that even in the best of times, intent is often expressed in a inexact, hand-wavy manner. What exactly does it mean for the throughput of the lexing to be improved? Under what load was this tested under, and in what conditions? Likewise, what does it mean for some feature to be implemented? What is the acceptance criteria for completion?

More importantly, how can we trust you?

To Know That You Know Nothing

A Philosopher Lecturing on the Orrery by Joseph Wright of Derby

“The aim of science is not to open the door to infinite wisdom, but to set a limit to infinite error.”
Bertolt Brecht, Life of Galileo.

In addition to brewing what I would imagine to have been some delightful cups of coffeeIn observance of tradition, this was written as I drank my morning cup of coffee.

Garraway’s Coffee House in Exchange Alley, London , the coffeehouses of 17th- and 18th-century England brewed in them some of the first sparks of the Age of Enlightenment. It was the cornucopia which brought us many advancements which we largely take for granted todayThough, given what the vicissitudes of the world has decreed for us in recent times, perhaps not for long. – individual liberty, separation of church and state, and natural rights, to name a few.

At the centre of this revolution was the notion of epistemic humility: the virtue of enumerating candidly what one knows, what one does not know, and what one can never know. This is the bedrock upon which the marvellous spires of science are built on. It is the philosophical foundations of the the scientific method, positing that theories are valuable insofar as they are falsifiable. Experiments are, therefore, rigorous attempts to falsify one’s hypothesis, in a paradoxical endeavour to solidify their credibility. Further progress led to the creation of the peer review process, Enshrined with it, then, is the necessity of the reproducibility of one’s claimed methodology results. This is the sole differentiator of science from superstition.

As one should be able to deduce, computer science is, indeed, a science. Surprised Pikachu Face In fact, we are situated at a vastly more fortunate position than virtually all the other sciences – that fortune is accessibility. We may construct experiments without the need for a mass spectrometer, microscope, or centrifuge. Our laboratory lies at our fingertips, and over it, we reign as divine. It is of profound shame, then, that I have not witnessed scientific rigour exercised in comparable ubiquity as with other sciences.

Most major, notable projects are able to maintain the discipline to practice this rigour. However, such projects remain a rarity in the grand scheme of things; it is only those projects created and maintained by the most seasoned, skilled developers which are able to achieve such a feat. Unfortunately, it is indeed a feat – one which remains unattained by the layperson.

Automation is the Poor Man’s Discipline

An Experiment on a Bird in the Air Pump by Joseph Wright of Derby

“Civilization advances by extending the number of important operations which we can perform without thinking about them”
Alfred North Whitehead, An Introduction to Mathematics.

One cannot expect the fresh graduateThis is speaking from experience – I am said fresh graduate yet to grow the proverbial, opulent facial hair. to compare to the wise old greybeard, as they simply have not earned a comparable quantity and quality of experience and learning. Even if one were to be sufficiently studious to know, in theory, the value of such practices, it is difficult to ingrain it into one’s habit without having experienced firsthand the dire consequences of the lack thereof. Hence, espounging discipline is simply a subpar solution to our problem.

On the contrary, let us consider the economic spectrum in which the notion of discipline lies. Discipline exists as a tradeoff for friction: the more friction is inherent to some endeavour, the more discipline is required to perform it, and vice versa. Hence, by reducing the friction necessary to build our software with scientific rigour, the less discipline is needed to perform it. We hope that this is sufficient to promote its ubiquity.

Automation, then, is the purveyor of the redution of friction, as it performs tasks in our stead, reducing what we would otherwise must do ourselves. If we are able to automate the collection of data, the maintenance of hygiene in our experimentation environment, the bookkeeping necessary for provenance and reproducibility, and other such menial chores, it would render rigour significantly more trivial to achieve. Furthermore, with sensitive tasks such as the enumeration of the experimentation environment, it removes the need for trust in humans; for example, we may enforce the hash of every binary executed in the experiment, or use a content-addressable storage for our artifact storage.

Note that the aim is not to enforce absolute rigour and hygiene, as otherwise, we would simply construct an experimentation harness over virtual machines such that the entire environment is perfectly verifiable and replicable. For one, some use cases simply do not support such setups, and what we consider to be the gold standard practice in one field may be detrimental to the quality of the results in another. We must not fool ourselves, yet the challenge is that we are the easiest person to fool. The first principle is that you must not fool yourself — and you are the easiest person to fool.

Richard Feynman.

No, the goal is instead to prove the other person’s claim, and audit its provenance instead of relying on the trust in their good nature. The tool that I envision is, therefore, envision is, therefore, one which verifies cryptographically the methodology and results of one’s experiment with respect to a change in source code and intended effect. With it, one would be able to

define a hypothesis in terms of some change in code, as well as an intended effect, such as some (measurably specified) improvement in performance, expansion in capability, and/or resolution of a gap in soundness or completenessSoundness means that anything that is false is not provable, and completeness means that anything that is true is provable. (such as bugs or vulnerabilities), and
declare in some computer-readable format (say, TOML or JSONAn interesting conjecture is that declaring it in a proof assistant such as Lean, Isabellle, or LiquidHaskell transforms our experimentation framework into one which formally establishes properties about the runtime behaviour of our software. A hypothesis is then declared as an axiom in the target formal language, alongside a falsification suite and an acceptance criteria with respect to said suite. A hypothesis that passes the criteria is then added to the generated library of axioms, allowing one to prove theorems on the runtime behaviour of the program. While it is true that empirical results are finite while formal axioms are universal, the declaration of the falsification sutie and acceptance criteria provides the necessary level of transparency: we have declared that these theorems are built on top of some finite empirical observation based on the given acceptance criteria – it is therefore only as good as such.) a suite of experimentation methodology to falsify said hypothesis,

such that the tool can automate the execution of the experiment and the collection of the resulting data. The responsible changes in code can be enumerated in the form of some commit or merge request. We can therefore see, in a standardised, reproducible, and verifiable way the each hypothesis and result. Then, even failed hypothesis can be valuable to store – we may learn better from our mistakes as they are recorded, allowing them to serve as prior knowledge and explanations for any further failures.

Coda

The Sleep of Reason Produces Monsters (El sueño de la razón produce monstruos) by Francisco Goya

“For me, it is far better to grasp the Universe as it really is than to persist in delusion, however satisfying and reassuring.”
Carl Sagan, The Demon-Haunted World.

My recent completion of my bachelor’s degree imparted upon me a nihilistic view of the field. I have been told that computer science is a science, yet, outside of a select subdomain of the field, I do not see the prerequisite discipline and rigour. There is not much either in the way of the humility to recognise that we are fallible beings, nor the sense of responsiblity and burden to recognise the gravity of our work. We are building the nervous system on which human society runs, yet instead of the precision of a neurosurgeon, I see clumsiness of a doll-wielding toddler.

I have been promised science, yet have received instead superstition.

Perhaps it is the lasting cultural effect of the Silicon Valley era, as those who moves fast and breaks things are those who rises to the top. Yet one only need to earnestly observe the toilet bowl after one’s bowel movement to witness a rise to the top. Such a conduct may have proven sufficient when software was built merely to power platforms to rate the attractiveness of Harvard students with dubious disclosure practices or consentSee the Wikipedia entry on the early days of Facebook, but it is unnacceptable when every conceivable dimension of the average person’s identity, and even life is stored inside of this Pandora’s box, unknowingly mistaken as a toy and handed to the nearest infant.This additionally serves as an argument for privacy and against digital surveillance: why would you trust a careless entity with no incentive to look out for you with quite literally every facet of your life?

The world has observed time, and time, and time, and time, and time, again of the dire consequences of such a lax (arguably criminally so) posture. I, for one, wish to see this discipline being treated with discipline. I wish to see us earn our place among the venerated pantheon of the sciences, instead of being one in name only.

Index