"Causal Analytics"
is ...
... the study of cause and effect relationships found in data through
inductive reasoning, the process of deriving general principles from
particular facts or instances. Causal Analytics seeks to answer
three important questions:
-
Does A cause B?
-
When?
-
By How Much?
|

|
The Importance
Causal Analytics is the essential
reason most people use mathematics, or perform data mining because with
it, we can:
-
Understand which
variables have the greatest bearing on performance or events of
interest
-
Predict
what is going to happen (if we see a cause occur, we can anticipate
the effect and its timing)
-
Change what is
going to happen (if we can influence a cause, we can achieve an
effect)
-
Diagnose why
something happened (if we see the effect, we can look to the causes as
to why)
-
See shifting causes in
non-stationary processes and determine when we are in and out of
control and why
Thus, Causal Analytics is
important to understanding, modeling, optimization and diagnosis.
Through Causal Analytics, we can more accurately & confidently predict
performance, foresee problems, have realistic expectations about what we
can do, understand how controllable and consistent our process are, see
what factors are the root cause of problems and most importantly, take
appropriate actions to achieve higher performance.
|
|
Time
is a Key Factor
Cause and effect relationships are
manifested in TIME. We theorize that there is always a lapse in
time, regardless of how large or small, between a change in a cause and
the appearance of the effect. If A drives B, and A changes, then B
will change after some amount of time elapses, whether that is picoseconds
or eons. Time tells us that If B occurs before A, then A cannot be a
cause of B. Accordingly, the analysis of time gives us important
clues to cause and effect. To support this, it is helpful (but not
essential) to sample data as fast as the causal time delay to see these
effects. If you know A drives B and in your data it appears that A
instantly causes B, then it is likely that the causal propagation is occurring
between sampling periods. This is common for "batch" data,
where various measures of each batch of product are made and collected and
most effects are within the making of the batch, so most causal effects
appear with a zero delay (seemingly instant). The data is still
valuable, its just that we don't know the actual timings of the
relationships in those cases.
Causal timing can be a bit
more complex because of cause and effect feedback loops. A feedback
loop is where A drives B which in turn drives A again. An example
might be when you increase the thermostat in your house (A), the furnace
responds and your house warms (B), but then you feel too warm so you
reduce the thermostat a bit (A again). Thus A causes B and B causes
A. However, there is latent time information that will indicate that the
relationship is A => B => A ...
There are a variety of
"causal relationships". Next we discuss the four most
common. |