Linear regression which is one of the main types of machine learning algorithms and completely based on the concept of supervised learning algorithms is highly used for predictive analysis and performs regression tasks. So, what does it mean by regression? Well, regression is nothing but a simple technique that depicts the relationship between two variables. Here is this particular blog, we will learn about Linear Regression and understand in brief about its two classification.
So, without much delay, let’s get started.
What is Linear Regression?
Francis Galton is credited for the discovery of the Linear Regression Model. Initially, he analyzed the heights of father and son and came out with the concept of best fit line or the regression line to find the mean height of all people.
As already stated above, linear regression can be defined as a predictive modeling technique which is normally used whenever to display a linear relation between two variables, i.e., the dependent and the independent variables.
In technical terms, we can define linear regression as the modelling approach to find relationships between one or more independent variables (predictor) denoted as X and dependent variable (target) denoted as Y by fitting them to a line which is known as the regression line.
It can be represented as Y = b0 + b1* x where the equation estimates exactly how much of y will change, when a certain amount of x changes.
Case Study: An example for better understanding:
Suppose, there is a telecom network named as Airfone. As we know, the duty of the delivery manager is to find out if there’s a relationship between the monthly charges and tenure of the customer. Accordingly, he collects all the relevant data and implements it in a linear regression representation where monthly charges are shown as the dependent variable and tenure as the independent variable.
After implementing the algorithm, he can draw the conclusion that there is a relationship between the monthly charges and the tenure i.e., as the tenure of the customer increases accordingly the monthly charges also increase. Now, drawing the best fit line will help the manager to find out more interesting insights from the data and so he can easily predict the values of y for every new value of x.
So, what are the advantages of Linear Regression which makes data analysts prefer it more?
- Linear Regression is quite simple to interpret the output coefficients and easier to implement them.
- Once we are aware of the relationship between the independent and dependent variable and have a linear relationship, this algorithm is the best to use because of it’s less complexity compared to other algorithms.
- Moreover, Linear Regression is susceptible to over-fitting but it can be avoided using some dimensionality reduction techniques such as L1 and L2 regularization techniques and cross-validation.
Applications of Linear Regression
Linear Regression is a great tool for regression algorithms to analyze the relationships among the various variables but it isn’t highly recommended for most practical applications because it mostly over simplifies real world problems as it assumes a linear relationship among the variables.
This regression is commonly used in financial portfolio prediction, salary forecasting, real estate predictions and in traffic when arriving at ETAs.
Some of its applications are:
- Linear Regression is extensively used in business sectors, or for sales forecasting based on the trends. When a company observes steady increase in sales every month, then the linear regression comes as it helps the company forecast in sales in upcoming months.
- It is quite beneficial in predicting the price, performance and risk parameters based on the sales of a product.
- On the other hand, Linear Regression also helps assess risk involved in insurance or financial domain. This analysis helps insurance companies find that older customers tend to form more insurance claims. Such analysis results play an important role in business decisions and are made to account for risk.
- It is helpful in determining the price and promotions on sales of a product for marketing effectiveness.
- It can be widely used for Astronomical data analysis
Different approaches to solve linear regression models
There are many methods which can be applied to our linear regression model and transform it in a more efficient form. Some of the common methods include:
- Gradient Descent
- Least Square Method / Normal Equation Method
- Adams Method
- Singular Value Decomposition (SVD)
Classification of Linear Regression
Linear Regression is broadly classified in two types which are described briefly below:
1. Simple Linear Regression: The type of Linear Regression, expressed in the form of a straight line, where we try to find out the relationship between a single independent variable (input) and a corresponding dependent variable (output) is known as Simple Linear regression. The same equation of a line can be re-written as:
- Y represents the output or what we know as the dependent variable.
- β0 and β1 are two unknown constants that represent the intercept and coefficient respectively.
- e (Epsilon) is known as the error term.
Some of real-life applications of Simple Linear Regression includes :
- Prediction of crop yields based on the amount of rainfall throughout the year: Here, yield represents the dependent variable and the amount of rainfall is depicted as independent variable.
- Marks scored by students based on the number of hours they study ideally. Here marks scored are dependent variables and the number of hours studied is independent.
- Predicting the Salary of an employee based on his years of experience. Here, experience becomes the independent variable whereas Salary becomes the dependent variable.
2. Multiple Linear Regression: In this type of Regression, we tend to find the relationship between two or more independent variables and their corresponding dependent variable, the output. The independent variables can be continuous or categorical. The equation that describes how the predicted values of y is related to the independent variables p is called as Multiple Linear Regression equation is stated below:
Some of the real time applications of MLR includes:
- The multiple linear regression analysis can be used to predict various trends, future values and also to get point estimates.
- It can also be used to forecast the effects or impacts of changes as it helps to understand how much will the dependent variable change when we tend to change the independent variables.
- Moreover, it can be used to identify the strength of the effect that the independent variables have on a dependent variable.
From the above two paragraphs on Simple and Multiple Linear Regression, we can summarize that Simple linear regression has only one x and y variable. On the other hand, Multiple linear regression has one y variable but two or more x riables. To explain more precisely , for instance,let us say we want to predict rent of a house based on sqaure feet only,this is were simple linear regression comes into play.But,on the contrary,if we want to predict the rent based on square feet and age of the house then use MLR for its prediction