Definition
Linear discriminant analysis (LDA) is a statistical technique that finds the linear combination of features that best separates two or more groups. It is derived from Ronald A. Fisher’s work around the 20th century.
He created the linear discrimination method to solve two-class classification problems. It has become widespread over time and is now used in many fields, like finance and biology.
While LDA offers many advantages, it requires specific conditions, like similar data for all groups; otherwise, it won’t work well.
How Linear Discriminant Analysis Works
- Calculate within-class and between-class scatter: The within-class scatter should be low, which means data points are tightly grouped. Conversely, the between-class scatter should be high, and the classes should be well separated.
- Identify linear discriminants: LDA finds feature combinations that maximize separation between classes and minimize the variance within classes.
- Project data: The original data is projected onto these new axes, providing a new perspective of the data where classes are as separated as possible.
Practical Uses of LDA
- Biology/Medicine: LDA helps to identify gene expression in different conditions. In medical medicine, it is used to differentiate benign and malignant tumors based on specific features.
- Face Recognition: Most biometric systems use LDA to extract distinguishable facial features.
- Finance: LDA is used to detect fraudulent patterns and predict customer defaults based on their financial data.
- Marketing: Marketers use LDA to group customers based on their demographics, purchasing behavior, and other data. It can also predict a customer’s likelihood of stopping using a specific product.
- Environmental Science: LDA can classify ecological zones and predict species presence based on habitat data.