Discrete Choice Models
Introduction
In this post, we provide a brief introduction to the theory of Discrete Choice Models. These models are used in Econometrics to describe the behavior of agents that are facing the problem of choosing one alternative out of a finite set of mutually exclusive alternatives. We examine the core idea of modeling choice behavior as a utility maximization problem and we show how - under certain conditions - this theory gives rise to the familiar Multinomial Logistic Regression model that is commonly used to address classification tasks in Machine Learning applications.
Choice and Random Utility
In many practical applications, we observe agents facing a decision problem where they need to choose one alternative out of many available alternatives. These agents may be persons, firms, or other kinds of decision makers.
For instance, consider a recently founded technology startup which needs to select a cloud computing service (e.g., Microsoft Azure, AWS, Google GCP) to start building its software product or service. In this case, the agent is the startup company and the competing alternatives are the various cloud computing services that are available on the market.
It is most often assumed that - among all available alternatives - agents will choose the one that maximizes their utility. In other words, agents will make the choice that produces the largest net benefit for them.
Most frequently, an agent’s decision process is influenced by a number of factors, including
the agent’s own characteristics
the characteristics of the different available alternatives
other factors that affect the environment in which the agent is making the choice.
Consequently, an agent’s utility function is typically a function of all the factors above.
We can assume that all of the factors above are observable by the agent and that the agent’s utility function is known to the agent. However, in most circumstances, we - the modelers of the agent’s behavior - can only observe some of the above-mentioned factors from the available data. Furthermore, we don’t typically know the specification of the agent’s utility function and we must therefore make assumptions on its characteristics.
To account for factors that we can’t observe, it makes sense to model the agent’s utility function as the combination of a deterministic component and a stochastic component. The deterministic component of the utility function - which we will denote by $V$ - is a function of the factors that we can observe (we will use $x$ to represent these observable factors). The stochastic component - which we will denote by $\epsilon$ - is instead a random variable representing the unobservable (for us) factors influencing the agent’s utility. The utility derived by agent $n$ from choosing alternative $j$ (e.g., Microsoft Azure) out of an exhaustive set of mutually exclusive alternatives (e.g., the set corresponding to Microsoft Azure, AWS, Google GCP) can be then written as
$U_{nj} = V(x_{nj}) + \epsilon_{nj}$.
The equation above is at the core of Random Utility Models. Different Random Utility Models can be obtained by specifying additional characteristics of $V$ and $\epsilon$. These models are frequently used in Econometrics to study the characteristics of the process according to which agents make choices.
When agents can select one alternative out of a discrete set of mutually exclusive alternatives, we can derive various Discrete Choice Models from the core Random Utility Model equation by making additional assumptions on the deterministic and on the stochastic components of the agent’s utility function.
Rediscovering an Old Friend
Let’s use the Random Utility framework discussed above to derive a very simple Discrete Choice Model. If agent $n$ makes decisions in a utility-maximizing fashion, then alternative $i$ will be the one chosen one out of $J$ available alternatives if and only if no other alternative provides to the agent larger utility than alternative $i$. In other words, the probability that agent $n$ will choose alternative $i$ is
$P_{ni} = P(U_{ni} > U_{nj} \, \forall j \neq i) = P(\epsilon_{nj} - \epsilon_{ni} < V_{ni} - V_{nj} \, \forall j \neq i)$.
If we now assume that the epsilon terms are independent and identically distributed (i.i.d.) according to the Gumbel distribution, one can show that the probability above reduces to
$P_{ni} = \frac{e^{V_{ni}}}{\sum_j e^{V_{nj}}} = \frac{1}{1 + \sum_{j \neq i} e^{V_{nj}}}$.
If we make the additional assumption that the observable part of the agent’s utility function is a linear function of a set of estimable parameters (which we will denote by $\beta$) and the set of observable factors, the equation above reduces to
$P_{ni} = \frac{e^{\beta_i x_{ni}}}{\sum_j e^{\beta_j x_{nj}}} = \frac{1}{1 + \sum_{j \neq i} e^{\beta_j x_nj}}$
where - as usual - one set of beta coefficients is set to 0 for identifiability.
In other words, the Discrete Choice Model that we obtained is an old friend of ours: the Multinomial Logistic Regression model.
From an Econometrics standpoint, the Multinomial Logistic Regression model is therefore a Discrete Choice Model arising from a particular specification of the stochastic component of the core Random Utility Model equation.
Limitations
Multinomial Logistic Regression has a few important limitations that can impact our ability to model an agent’s choice behavior in practical applications. In this section, we focus on the most obvious of these limitations.
Consider two alternatives, $i$ and $j$. According to the Multinomial Logistic Regression model, the ratio of the probability that agent $n$ will choose alternative $i$ and the probability that the same agent will choose alternative $j$ is equal to
$\frac{P_{ni}}{P_{nj}} = e^{V_{ni} - V_{nj}}$.
The odds of choosing alternative $i$ over alternative $j$ depend only on alternatives $i$ and $j$ and are independent of all other alternatives that are available to the agent. This property of the Multinomial Logistic Regression model is commonly referred to as the Independence from Irrelevant Alternatives (IIA) property.
In several real-world situations, the IIA property is not a desirable characteristic of the model. To illustrate this, consider the following problem.
Assume that a traveler can choose to go to work by car or by taking a blue bus. Also assume for simplicity that the utility that the agent derives from traveling by car is the same as the utility that they derive from traveling with the blue bus. Then, the probability that the traveler will choose to travel to work by car will be the same as the probability that the traveler will choose to travel to work with the blue bus: $P_{\text{car}} = P_{\text{blue bus}} = 1 / 2$. Imagine now that a third option becomes available: a red bus is introduced and the traveler considers the red bus completely equivalent to the blue bus. Because the blue and the red bus are equivalent for the traveler, the probability that the traveler will choose either is the same. In other words, the ratio of these probabilities is $P_{\text{blue bus}} / P_{\text{red bus}} = 1$. According to the IIA property, the ratio of the probability of choosing the car over the blue bus is unchanged by the introduction of the new red bus: $P_{\text{car}} / P_{\text{blue bus}}$ must therefore still be 1. It follows that after the introduction of the red bus, the new model probabilities are: $P_{\text{car}} = 1 / 3$, $P_{\text{blue bus}} = 1 / 3$, $P_{\text{red bus}} = 1 / 3$. This is not what we expect in real life, however. In real life, we expect the probability of the traveler choosing to travel by car to still be $P_{\text{car}} = 1 / 2$ and the remaining $1 / 2$ of the probability to be equally split between the blue and the red bus: $P_{\text{blue bus}} = 1 / 4$ and $P_{\text{red bus}} = 1 / 4$.
In the blue-bus-red-bus example, the introduction of a third alternative (the red bus) influences the odds of the agent choosing any of the alternatives that were originally available (the car and the blue bus). In that situation, the IIA property presents a limitation in our ability to model the agent’s choice behavior by means of Multinomial Logistic Regression.
Want to Learn More?
To learn more about modeling choice behavior with Discrete Choice Models, Kenneth Train’s book Discrete Choice Methods with Simulation is a great place to start. The book is freely available online in pdf format.
The book covers additional Discrete Choice Models as well as various extensions of the Multinomial Logistic Regression model that relax the IIA property and other limitations. Additionally, the book discusses how to estimate these models using maximum likelihood and simulation.
The book also provides multiple practical examples of application of Discrete Choice Models and highlights how quantities such as consumer surplus and elasticities can be extracted, analyzed, and interpreted from these models.
Conclusions
In this post, we briefly introduced Discrete Choice Models. We discussed how these models naturally arise from an attempt to model choice behavior from a causal perspective under the assumption that agents act according to the principle of utility maximization.
We showed that the well-known Multinomial Logistic Regression model is itself a Discrete Choice Model and we highlighted one of its most important limitations: the IIA property.
Understanding the conceptual links that exist between Discrete Choice Models - which are popular in the field of Econometrics - and well-known models used in Machine Learning for classification tasks - such as the Multinomial Logistic Regression model - can be useful to make better modeling decisions in practical applications.
Need help analyzing your data and building statistical models? Data Captains can help! Get in touch with us at info@datacaptains.com or schedule a free exploratory call.