Modeling Principles

Our guiding research principles have helped us define a very special research flavor within the presently popular areas of Data Science and Machine Learning. As chemical engineers, we believe such domain knowledge should not be wasted in the rush to develop advanced regression algorithms.  After a lot of thought, we have come to believe the Machine Learning (ML) or Statistical Learning (SL) is very little more than a set of very sophisticated regression or classification algorithms.  We also have been substantially influenced by a distinguished computer scientist, Leslie Valiant.  He is the author of the book “Probably, Approximately Correct” (Basic Books, 2013 New York).  In the first chapter of the book, he distinguishes problems as theoryful and theoryless. Theoryfull problems are the ones for which there is, in principle, an underlying physicochemical theory, possibly expressed in some quantitative mathematical way that can explain the studied problem.  The movement of celestial bodies or the time evolution of a chemical reaction in a pharmaceutical process are examples of theoryful problems.  He contrasts this to the set of problems that lack such a scientific underpinning and call this class theoryless.  He states:

“…  In contrast, the vast majority of human behaviors look theoryless. Nevertheless, these behaviors are often highly effective. These abundant theoryless behaviors still lack a scientific account and it is these that this book addresses …”

An example of a theoryless problem is the selection of retirement investments by someone who is not trained in investment banking. Another theoryless problem is the one faced for a particular TV manufacturer who wishes to find out the likelihood that the owner of a particular automobile brand will buy one of this company’s TV models. 

Most research problems in the area of Process Systems Engineering are theoryful.  However, we might not presently possess all the knowledge necessary to postulate a knowledge-driven model that encapsulates the underlying theory for such processes.   Thus, we turn our attention to the use of techniques and tools, or more generally to algorithms, that have been developed for theoryless problems.  We then develop Data-Driven models to represent the input-output behaviors of our process not necessarily paying attention to whether this is a distillation column, chemical reactor, or heat exchanger.  This is convenient but sinful, as we waste our partial knowledge of the process; for example, that it is a reactor and not a distillation. Thus, a blind application of the Data Science and Machine Learning algorithms should not be our ultimate aim. We should aim to modify such theoryless algorithms so that we can utilize our partial knowledge of the inner workings of our theoryful processes.