A previous post, Why Johnny Can't Estimate, mentioned some resources for estimating, the principles of business and technical management that demand estimates be made to make decisions, and background on the sources of uncertainty, that create risk, that require estimating to increase the probability of project success
One of the #Noestimates advocates has now discovered a phrase:
Estimates are non-ergodic
This statement is a Category Error in mathematics, which is an error that occurs when we ascribe properties to a thing that can’t possibly have that property.
The thread goes on to say ...
The time average of task completion delay (for the Whole Project) will always be higher (data suggests ~60%) than the ensemble average single task averages) because (at least) ensemble average delays downplays #Blackswans.
First, these words come from Macroeconomics - the economics of society, business markets, and the financial systems of in countries, and global systems. As well Black Swans in Macroeconomics are actions that no one saw coming. The 2008 mortgage crisis for example (although many did an made lots of money), the government didn't.
In software development, those managing the project have some understanding the market forces (from their marketing departments), the technology (from their engineering department), and how to manage in the presence of Aleatory and Epistemic risk (the managers running a successful firm). Black Swans are Ontological Uncertainties (we never saw it coming). If you work at a firm that doesn't have an understanding in some form of these three systems (market, technology, and managerial skills), better start looking for a new job now.
The term “Black Swan event” has been part of the risk management lexicon since its coinage in 2007 by Nassim Taleb in his eponymous book titled The Black Swan: The Impact of the Highly Improbable. Taleb uses the metaphor of the black swan to describe extreme outlier events that come as a surprise to the observer, and in hindsight, the observer rationalizes that they should have predicted it. (taken from [3])
Software Development execution and modeling is Microeconomics not Macro. The author of quote read a Taleb book on Black Swans and assumes those Black Swans are in software development. If they are, it's only because those working at the firm are clueless as to why they're there, what theyre supposed to be doing, and how to do it. They then deserve to be impacted by a Black Swan and go out of business.
Skipping to the End
Since there seems to be confusion in some circles of software development, let's sort out the difference between Macro and Micro in the context of writing software for money, other people's money, in the presence of uncertainty - reducible (Epistemic) and irreducible (Aleatory) uncertainty - both of which impact the probability of success of the software development efforts.
- Macroeconomics is about the big picture of industry, country, or global economic factors. Macroeconomics includes a nation's Gross Domestic Product (GDP), unemployment rates, growth rate, and how all these concepts interact with each other. An important outcome of Macroeconomics is establishing the appropriate interest rates in an economy, where the government sets a base rate and banks work from there.
- Microeconomics is about individual or company behaviors. Microeconomics is how businesses establish how they price their products by understanding the needs of consumers. These prices are determined through many factors, but one important factor is the cost to develop the product and the range of prices possible in exchange for the perceived value of the product.
One of the core fallacies of #Noestimates is when you hear we focus on Value. All business microeconomics decision processes focus on Value, but along with that is the equal focus on the Cost to produce that Value. If the cost to produce that Value is larger than the Price received for that Value, the business will be out of business very quickly.
Many times when there is a statement made that is actually uninformed or even a fallacy, it's a motivation to write about the correct concept. Let's cut to the chase, then come back for the details.
- Software projects are non-ergodic.
- Ergodic processes are stochastic processes with a statistical property that can be deduced from a single, sufficiently long, random sample of the underlying process.
- Non-Ergodic processes don't possess this property.
- Software development is a non-ergodic process.
- Estimates are neither Ergodic or Non-Ergodic themselves. Estimates are estimates of an ergodic or non-ergodic PROCESS.
In the case of software development, the non-ergodic process produces various parameters. It's the estimate of the parameters that we're after, generated by the ergodic process. These parameters include the time series of attributes like cost, schedule, the number of Features produced, the defect rate for those Features. Any parameter from the project.
It seems this is not just a misuse of a term or a typo. Perhaps this is the lack of understanding of the term that was transferred from the Black Swans book without knowing what was being read. 30 minutes on Google with ergodic and non-ergodic, estimating processes as the search phrase would have provided all that is shown below.
A Summary of Ergodic and Non-Ergodic Before Proceeding
Ok, let's dive into the details. In econometrics and signal processing, a stochastic process is said to be ergodic if its statistical properties can be deduced from a single, sufficiently long, random sample of the process. The reasoning is that any collection of random samples from a process must represent the average statistical properties of the entire process. In other words, regardless of what the individual samples are, the collection of samples must represent the whole process. Conversely, a process that is non-ergodic is a process that changes erratically at an inconsistent rate. [1]
- An Ergodic system can exist in all it's possible states through a universally applicable random process. That is, the system is purely random and each system state is equally probable.
- An Ergodic system is one in which all its theoretical properties - mean, variance - can be derived from a sufficiently large sample.
- Given a large enough sample, the sample mean approaches the population mean.
- For any system to be ergodic it has to be closed, with no outside influences. Modeling a gas in a closed container can sample the states of the system over some period of time to determine the system average of all the molecules in the container and make a statement about the temperature of the container without having to state the temperature of each of the individual molecules.
- Modeling ergodic systems has more problems:
- It's not that models of ergodic systems can't be formulated mathematically.
- It's more fundamental: ergodicity is an asymptotic concept.
- The time averages have to be taken over the entire state space, then time has to tend to infinity, and then those samples compared to ensemble averages.
- Even if the state space were well-defined and finite, testing for ergodicity is problematic, since the phrase "let time tend to infinity" in the definition of ergodicity cannot be tested. And if you can't test it, it is not a scientific hypothesis, thank you, Karl Popper.
Sofwtare Development Systems do not have this behaviour
- A Non-Ergodic system cannot exist in all it's possible states, because the underlying processes that drive the system into each state have constraints on their behavior that limits what state it can drive the system into.
- As well, non-ergodic systems are ones in which new states can come into being with new information, new inventions, new politics, new behaviors, misbehaviors of people, externalities and similar forces on the system.
- The system changes while we are sampling it. But as an aside this brings up a serious question - how do you distinguish between a sampling error and an actual change in the system by a continually changing world? (a subject of another post).
- A Darwinian evolution process is non-ergodic.
For software development projects, non-ergodic behaviors exist.
A system that exhibits non-ergodicity is characterized as one in which a subsequent stage of the system (project) depends only on the current stage, and over time, the system continuously evolves to forget its initial state and can be described by a Markov Process. [4] So the core question is can the management of such a project
OK, back the Ergodic problem. What does Ergodic mean in Software Development?
In probability theory, an ergodic system is one that, has the same behavior averaged over time, as it does, averaged over the space of all the system's states - its phase space. In physics, the term implies that a system satisfies the ergodic hypothesis of thermodynamics. The same behavior works in a model of the gases in thermodynamics - Boyle's Law for example which says energy is distributed among accessible configurations in a random process. This energy randomness is the basis of an exponential distribution of the occupied energy states of the gas under observation. Hence the Boltzmann Distribution.
A random process is said to be ergodic if the time averages of the process tend to the appropriate ensemble averages. This definition implies that with probability 1, any ensemble average of {X(t)} can be determined from a single sample function of {X(t)}. For a process to be ergodic, it has to necessarily be stationary. By the way, not all stationary processes are ergodic.
In our software development world, this time-series can be attributes of the work: durations, delays, defects, work effort, functions produced, anything that is tangible evidence from the work as a function of time - hence a time series.
As well, an ergodic process is one where its statistical properties, like variance, can be deduced from a sufficiently long sample. For example, the sample mean converges to the true mean of the signal (the sampled values of some process), if you average long enough.
So it seems obvious that developing software and the recording the time series variables would be Non-Ergodic since that data would show properties NOT compliant with the definition of an ergodic process.
Ergodic systems tend in probability to a limiting form, that is independent of the initial conditions. The breakdown of the ergodic condition forms dependent processes. This is what it seems like in software development processes. Dependencies appear or are present in the beginning, that influence the behaviors of the other processes in the system of developing the software.
Non-Ergodic Processes and Writing Software for Money
But writing software is driven by processes that are non-ergodic - that is these processes do not visit all possible states of the development process. This means same statistical behaviors are not present across all the processes of software development across all time. We find non-ergodic systems (processes) in our everyday life. The economy is one.
But let's remind ourselves A random process is ergodic if the time average of a sequence of observations is the same as the time average over the entire phase space of the system when the sample is long enough. This means that sampling gives information about the system.
A Software Development Systems is non-ergodic. Processes interact with each other, requirements change so those interactions change, the productivity of individuals are different, so the processes they use to produce Value change as a function of time.
The Fundamental Flaw of the Original Quote
Let's revisit the quote,
Estimates are non-ergodic
Repurposing James Carville's famous quote to Clinton It's the Economy Stupid and replace it with It's the Process Stupid
It is not the Estimates that are non-ergodic.
An estimate is just a number, with a confidence range hopefully, since no point estimate can be correct.
It's the process that is ergodic or non-ergodic.
So now that we know a bit about ergodic and non-ergodic processes, let's see where the critical misunderstanding is in the author's quote.
For processes like the economy (where he gets is ideas from a macroeconomics book (Taleb, who himself is quite controversial, do your homework for Google Taleb and Criticism) there is an assumption that the system under observation - the system needed to generate those random samples for the time series - is OPEN. That is, the possible system states are themselves a set of random processes. This would only be true in software development if there were NO connections between any of the process steps, the processes themselves and the resulting outcomes of the processes. That is every function developed, every effort made in the development of that function, any line of code produced were completely separate from all other lines of code, efforts, steps and resulting features (these are the possible systems states referenced when we speak about ergodic and non-ergodic processes).
In software development, the states of the systems are constrained and coupled in a network of activities. This is one reason why software development is non-ergodic. Fine, the original quote estimates are non-ergodic is still in error, since it's the system that has the behaviour, not the estimate.
The second fundamental flaw is that any non-ergodic process (like the quote wants us to believe) the expectation value of any of the variables is Zero at all times, where the time average is a random variance with divergent variance. This is a fundamental principle of non-ergodic processes.
But software development processes don't work that way. The past performance drives future performance within the parameters of the processes. The software development systems are CLOSED, Not OPEN as required for the underlying system to be non-ergodic.
This is the notion of Velocity, Earned Value Management, Statistical Forecasting from Story Points (if that's your cup of tea), reference class forecasting, trending, or any forecasting process where the system - while variant - has a trend that looks close to what has happened in the past.
But there is still an open two-part question:
Are software developemnt projects non-ergodic?
Can we estimate the outcomes of a non-ergodic process?
We find non-ergodic processes in places like the economy, logistics networks [5]. And there are discussions of how to estimate the behaviors of non-ergodic processes [6]. But it's not clear exactly how to connect a non-ergodic process with the development of software. Markov processes seem to be one way. The research starts with the question do developers learn from their experiences? [7]
The Core Fallacy of Applying Statistical Mechanics to Software Development
Remember the original post uses the term Ergodic and conversely Non-Ergodic. Econometrics uses these words as well.
If writing software were a Non-ergodic process, the resulting time series would not represent crucial aspects of the process and would be incomprehensible through observation either for lack of repetition - that is these observations would involve only transit states which are unique or they lack stability - or the transition probabilities between each state (a cost, schedule, or technical performance measurement) would be so variable that there would never be enough observations available to determine their states (values).
Biological evolution, social processes, Maxwellian gases, all involve structural changes the are inherently non-ergodic.
But in fact and in observation, development processes of software don't jump around randomly. If they did, that development process would be in fact chaotic.
If software development were non-ergodic, what would be observed, would be labeled as Chaos. This is the actual definition of Chaos, a time series of values that have no stability can exist in all states of the system, with no preference for any particular state. And the behavior of the software development system cannot only be determined by the long-term average over the ensemble of states. But that does NOT mean we can't determine the Short-term behaviors - so this is NOT the escape for the #NoEstimates advocates tossing around terms like non-ergodic. And short term is all we need to get to the next release.
If we now invert the discussion and use the definition of an ergodic system.
The ergodic property is assigned to a random process (software development is a random process driven by the underlying uncertanties) whose value can be determined from a sample of the process. And that the ensemble average (the collected whole) equal the corresponding time averages with a probability of 1 (that is they match).
Does that sound like software development projects you work on? Not ones I've encountered in the past 40 years. The original poster has made the mistake, yet again, of reading a single book about the black swan aspects of Macroeconomics and assuming those are applicable to the development of software.
Software systems are non-ergodic for sure (they live in a restricted number of states due to constraints and dependencies between those states). But the estimates of that non-ergodic system are not the issue, it's the system that is non-ergodic.
The estimates are themselves just samples of the state of the non-ergodic system.
How Did I Come to This Knowledge?
I was educated as a physicist, unsuccessfully practiced as one for a short time, and switched to signal processing, which has the same underlying theories as to topic discussed here, when searching of signals deep in the noise of doppler radar systems looking for intercontinental ballistic missiles coming our way. In undergraduate and graduate courses we must take a Statistical Mechanics class, starting with the laws of Thermodynamics. That were physics students first encounter these terms.
References
[1] Originally due to L. Boltzmann. See part 2 of Vorlesungen über Gastheorie. Leipzig: J. A. Barth. 1898. OCLC 01712811. ('Ergoden' on p.89 in the 1923 reprint.) It was used to prove equipartition of energy in the kinetic theory of gases
[2] Papoulis, Athanasios (1991). Probability, random variables, and stochastic processes. New York: McGraw-Hill. pp. 427–442. ISBN 0-07-048477-5.
[3] Ergodicity, Econophysics and the History of Economic Theory, Geoffrey Poitras
[3] Black Swans in Risk, Myth, Reality and Bad Metaphors
[4] "The Root Causes of Failure in Complex IT Projects" Complexity Itself," Kaitlynn M. Whitney and Charles B. Daniels, Conference on Complex Adaptive Systems, Procedia Computer Science, 20, pp. 325-330, 2013.
[5] "Detecting Non-Ergodic Simulation Models of Logistics Networks," Falko Bause and Jan Krige, Value Tools '07, October 23-25, 2007.
[6] "Asymptotic behavior of time averages for non-ergodic Gaussian processes," Jakub Slezak, Annals of Physics, 383, pp. 285-311, 2017.
[7] "A Hidden Markov Model of Developer Learning Dynamics in Open Source Software Projects," Param Vir Singh, Nara Youn, and Yong Tan, Information Systems Research 22(4), March 2010.
[8] Asymptotic behaviour of time averages for non-ergodic Gaussian processes