Signals. Interactions. Free-Will

In one of our previous expositions, we examined the bias-variance dilemma and explored baselining as a method for model selection. Today, we turn our focus to another aspect of this dilemma: features.
In the pre-deep learning era, machine learning practitioners devoted much time to designing features, also known as signals in control theory or independent variables in statistics. The effectiveness of any statistical model, given a constant model complexity, is largely dependent on the quality of its input signals. For instance, a model predicting atmospheric pressure on a chaotic new planet would consider variables like location, temperature, wind-speed, date, time, and the number of moons. The inclusion of high-quality features enhances the model’s expressiveness by reducing bias, while too many noisy features increase variance, leading to overfitting. Best practices in managing noisy features include feature selection, dimensionality reduction, and feature cleaning/normalization/quantization.
A Personal Kaggle Experience
My journey in a Kaggle challenge a few years back provided a practical arena for these concepts. The challenge presented a dataset with around 50 straightforward variables and a binary outcome. It was a typical scenario: numbers, algorithms, predictions. Given that these variables were rational numbers, decision trees were the preferred model among participants. After setting up an initial working pipeline and then working through the standard methods and tricks, I realized that many features were subtly interdependent, a common-sense observation I had initially overlooked. For instance, in predicting hydrosphere pressure in a lake using ‘mass’ and ‘volume,’ introducing ‘density’ as a derived variable (mass/volume) could significantly improve the model.
This realization led me to the concept of interaction features, where two variables, A and B, can be combined in numerous ways, such as Sin(A/B) or log((A-B)²). Engaging with other teams online, I discovered innovative approaches to generating feature interactions using compilers with custom grammars based on dataset understanding.
Insights from Deep Neural Networks
This approach mirrors why deep neural networks excel in many contexts. Each layer in a neural network forms new, complex function compositions, like f(g(k(x))), which are refined during training to create richer feature expressions. However, the internal workings of these compositions in neural networks remain largely elusive.
* * *
Approached with a casual disregard for minutiae, the bias-variance dilemma in machine learning parallels the existential debate of destiny versus free will. This mirrors the art of modeling: we cherish life for its unforeseen connections, the surprises that bring delight and, occasionally, a touch of madness. In this light, what truly elevates a story? Is it merely the end result, like finding your soulmate, or is it the whimsical and unexpected journey that led you there? Consider the adventures of Indiana Jones: is his narrative defined by the destinations he reaches, or is it the series of unforeseen, fortuitous events that unfold along his journey?