How Does Regression Analysis Work?
Regression analysis attempts to mathematically correlate an outcome with one or more variables. These variables can be defined as follows:
● Dependent variable - The observed outcome
● Independent variable - The observed input
In a practical setting, regression analysis begins with a set of measurements. These measurements can be based on one or more independent variables.
The question that regression analysis attempts to answer is whether or not a relationship exists. Relationships are quantified using a statistical technique called the method of least-squares. This can be done as follows:
● Find the mean (i.e., average) of the data set.
● Calculate the variance for each data point. The variance can be described as the distance between the mean and the measurement.
● Take the square of each variance. This eliminates the direction of the distance (i.e. positive and negative values) and leaves only the magnitude of the distance.
When graphed on an X-Y chart, the observations will appear as a scatter plot. Meanwhile, the analyst can use software to quickly calculate the mean and least-squares method for each measurement so that a regression equation can be built. This equation will serve as the hypothetical best-fit line or regression line and will be drawn through the data set. This reveals to the analyst the nature of the trend: increasing, decreasing, no change, etc.
Types of Regression
There are three main types of regression analysis.
Simple Linear Regression
Simple linear regression is the most basic form of regression analysis. It attempts to correlate one input with one outcome.
For example, a mortgage company might attempt to connect applicant income with loan size. In this case, the applicant’s income would be the independent variable and the loan size would be the dependent variable.
Simple linear regression can be expressed by the following formula:
Y = a + bX + u
Where
- Y = The dependent variable
- X = The independent variable
- a = The y-intercept
- b = The slope associated with the input
- u = The regression residual or error term. The smaller the error, the more the regression line fits the data.
It’s important to note that one of the results of regression analysis can be that the data is uncorrelated. In other words, you can demonstrate the opposite is true and that the inputs have little to do with the output.
Going back to our mortgage company example, suppose the dependent variable had been applicants wearing blue shirts. It would be highly unlikely the color of the applicant’s shirt would have any bearing on the amount of money they wish to borrow. Therefore, there’s a good chance that the error term would be relatively high.
Multiple Linear Regression
Multiple linear regression builds upon the simple model by adding more variables. Returning to our mortgage company example, suppose there were other important characteristics noted about the applicants: net worth, education level, age, etc.
An analyst could combine these inputs to build a more complex multiple linear regression equation or best-fit line. While the calculation would be more complex, there is the possibility that it could lead to a more accurate prediction than the simple model might yield.
Again, it could also be the case that some variables do not have any correlation to the outcome. Therefore, the analyst needs to be careful about which measurements they consider to build their equation.
Non-Linear Regression
Non-linear regression is the most complex of the regression analysis techniques. It also attempts to fit a trend line to the data set but allows for the trace to be curved so that it's a better fit. Again, this may lead to a more accurate predictive model.
Mean Reversion vs Regression
Oftentimes, investors will use the terms regression and mean reversion interchangeably. Even though the two techniques have similar qualities, they are in fact different.
Mean reversion is a type of price indicator that attempts to take the history of an asset’s price movements and fit them to a moving average or mean. Because prices are volatile and will fluctuate up and down around this mean, we can also draw upper and lower boundaries referred to as channels. Channels are generally two standard deviations from the mean.
The theory behind mean reversion is that despite an asset’s volatility, the price will eventually revert back to the mean. In other words, if it moves outside the channels, either too high or too low, then this can indicate good selling or buying opportunities for the investor.
Limitations of Regression Analysis
As with all mathematical modeling, regression analysis isn’t perfect. Even though we can create more complex equations, there will always be some inherent errors and therefore the possibility that the model could predict the wrong outcome.
Another expression you’ll often hear with regression analysis is that “correlation is not causation”. To say this another way, one cannot definitely conclude that a variable can predict an outcome just because the data set seems to suggest a trend. Sometimes this may only be a coincidence. Instead, analysts should use regression analysis to test inputs that they believe may be reasonable indicators.
The Bottom Line
Regression analysis is a helpful way for analysts to measure the relationship between variables and then create a model that can be used to predict future outcomes. This is done using the method of least squares to create a best-fit line that can be linear or even non-linear.
Regression analysis should not be confused with mean reversion which is another type of financial indicator used to plot the trendline and standard deviation. This technique can be used to help investors find good entry and exit points to trade assets.