
Quantitative methods are fundamental for financial analysis, risk management, and decision-making in competitive exams. This blog explores essential concepts in probability distributions and statistical inference. It provides a foundational understanding of analyzing data, making informed estimations about populations from samples, and identifying common biases that can distort analytical conclusions.
Probability distributions describe how outcomes of a random variable are distributed. They help in understanding uncertainty, predicting results, and analyzing financial data patterns. These distributions are broadly classified into discrete and continuous types.
Discrete Random Variable
A discrete random variable is a variable whose possible outcomes can be counted. These outcomes are distinct and separate.
Examples:
Coin Toss: Tossing a coin yields two countable outcomes: Heads or Tails.
Dice Roll: Rolling a die yields six countable outcomes: 1, 2, 3, 4, 5, 6. Each outcome has a probability of 1/6.
A continuous random variable is a variable whose outcomes cannot be counted. These variables can take any value within a given range.
Examples:
Rainfall: The exact number of raindrops during a rainfall cannot be counted.
Height, Temperature, Time are also common examples of continuous random variables.
The binomial distribution describes events that have only two possible outcomes, meaning they are binary in nature.
Characteristics:
An event can only result in one of two outcomes.
The sum of the probabilities of the two outcomes is always 1.
Examples:
Coin Toss: If the probability of getting a Tail is 0.5, then the probability of not getting a Tail (i.e., getting a Head) is 1 - 0.5 = 0.5.
Stock Price Movement: A stock price can either increase or decrease. If the probability of a stock price increasing is 60% (p), then the probability of it decreasing is 40% (1-p). (Memory Tip: If P(event occurs) = p, then P(event does not occur) = 1 - p.)
The binomial model is widely used for scenarios with binary outcomes, particularly in financial modelling:
Economy: An economy can either perform well or poorly (e.g., a 70% chance of doing well and a 30% chance of doing poorly).
Market and Stock Impact: This model helps compute probabilities for sequential, conditional binary events.
If the market goes up:
P(Stock Up | Market Up): The probability of the stock going up, given that the market went up. (e.g., 40%)
P(Stock Down | Market Up): The probability of the stock going down, given that the market went up. (e.g., 60%)
If the market goes down:
P(Stock Up | Market Down): The probability of the stock going up, given that the market went down. (e.g., 20%)
P(Stock Down | Market Down): The probability of the stock going down, given that the market went down.
Sampling is the process of selecting a subset (sample) of individuals from a larger group (population) to estimate characteristics of the whole population.
Rationale for Sampling:
Cost-effectiveness: Collecting data from an entire population is often too expensive.
Time-saving: Analyzing a large population dataset is very time-consuming.
By taking a sample, it is assumed that the sample's average (or other statistics) can accurately represent the population's average.
Key Terminologies:
Point Estimate: The average (or other statistic) derived from a sample.
True Population Mean (μ): The actual average of the entire population. Represented by the Greek letter mu (μ).
Sample Mean (x-bar): The average calculated from the sample. Represented by x̄.
The goal of estimation is to select a sample such that the point estimate (x̄) is a good approximation of, or ideally equal to, the true population mean (μ).
When using a sample to estimate population parameters, the estimator (e.g., sample mean) should ideally possess certain properties:
Unbiasedness
Definition: An estimator is unbiased if its expected value is equal to the true population parameter.
Condition: The sample average (x̄) equals the population average (μ).
Symbolically: x̄ = μ implies an unbiased estimator.
Efficiency
Definition: An unbiased estimator is considered efficient if it has the smallest variance among all possible unbiased estimators.
Concept: A lower variance indicates less spread or risk in the estimates. For instance, a stock's performance model with very low variance (stable movement) is more efficient than one with high volatility.
Consistency
Definition: An estimator is consistent if, as the sample size (n) increases, the estimates get closer to the true population parameter.
Concept: Larger samples generally lead to more accurate estimates. For example, a sample of 3 million from a 30 million population will yield a more accurate answer than a sample of 300,000.
Bias occurs when the sampling or estimation process does not accurately represent the population or when available information is not properly used, leading to flawed conclusions.
Definition: Occurs when a dataset is repeatedly tested with various models or techniques until one appears to work. This can lead to models that seem effective on historical data but fail in new data, as the perceived success might be due to chance.
Example: Applying numerous trading techniques to the same financial data until one shows a profitable result. This technique might not genuinely work but only appears to do so on the "snooped" data.
Mitigation: Divide the dataset into training and validation (or testing) sets.
Training Set: Used to develop and select models/techniques (e.g., 2020-2024 data).
Validation Set: Used to test the selected model's true performance on unseen data (e.g., 2024-2025 data). This confirms genuine performance.
Definition: Occurs when the sample is not representative of the population due to non-random or inappropriate selection, leading to skewed results. This includes issues with the sample size or inclusion criteria.
Example 1 (Time Period): Evaluating a mutual fund's performance using an excessively long period (e.g., 2010-2025) might include too many varying market conditions. Conversely, an excessively short period (e.g., 2024-2025) might not capture enough information. An appropriate sample period is crucial for unbiased assessment.
Example 2 (Survivorship Bias): This is again very important. Occurs when only existing (surviving) entities are considered, while those that failed or ceased to exist are excluded. This creates an artificially positive impression of performance. (Memory Tip: Remember by the name: "Survivorship bias" means you only show the "survivors.") An example is a fund manager only showing performance data for their 5 successful funds, omitting data for a fund that failed, leading to an inflated perception of overall performance.
Definition: Occurs when an analysis uses information that would not have been available to investors at the time a decision was made.
Example: P/E Ratio Calculation
Earnings Per Share (EPS): A company's profit per share (e.g., ₹2 on March 31, 2025).
Market Price: The share price at a specific date (e.g., ₹100 on March 31, 2025).
P/E Ratio (Price-to-Earnings Ratio): Market Price / EPS (e.g., ₹100 / ₹2 = 50).
Bias: While the market price is available daily, companies typically disclose their EPS with a significant delay (e.g., March 31 EPS is reported on September 30). If an investor uses the P/E ratio on March 31 for an investment decision, they are implicitly using EPS information that was not yet publicly available. This use of future information is a look-ahead bias.