Welcome to the 8th part of our machine learning regression tutorial within our Machine Learning with Python tutorial series. Where we left off, we had just realized that we needed to replicate some non-trivial algorithms into Python code in an attempt to calculate a best-fit line for a given dataset.

Before we embark on that, why are we going to bother with all of this? Linear Regression is basically the brick to the machine learning building. It is used in almost every single major machine…

Is it not the formula for best fit slope is (mean(xy) – mean(x).mean(y)) / mean(x^2) – mean(x)^2 ? ofc, the result would be the same. but curious why you swapped ? Reference – https://www.wired.com/2011/01/linear-regression-by-hand/

you cant confuse me with pemdas , im still gonna use BODMAS

hahah

Can anybody explain why I got different values of m with/without specifying the dtypes? If I don't specify the dtypes I'll get m = 0.8333333333333334.

in the 3 rd tutorial can you explain the unix operation in pandas

Children, when you see sentdex struggle in 9:11, always remember… Writing too complicated math expression is no no.

Use the following breaks:

def best_fit_slope(xs, ys):

numerator = (np.mean(xs)*np.mean(ys)) – np.mean(xs*ys)

denominator = np.mean(xs)**2 – np.mean(xs**2)

return numerator/denominator

Love you sentdex! keep up the good work!

not Gradient Descent

Hey sentdex. So I tried deriving the equation for a slope of best fit, but I got the wrong answer because instead of squaring all the residuals, I would simply take their absolute value. Why is it that we square the residuals? Wouldn't that make the line of best fit more susceptible to outliers? Why would we want that?

Thanks so much for your tutorials, I've been following the series and they're awesome!

Why did'nt you teach best fit for multiple dimensions ???

not fair! you spent two episodes explaining some pupil stuff, please go back explain more on the previous hard coding.

Hello, my question is: given your definition of best_fit_slope(xs,ys) at 10:12. How many times the value of mean(xs) gets computed? Three times? Wouldn't it worth it to introduce a temporary variable?

to avoid confusion, it is better to do

def best_fit_line(x, y):

m = mean(x)*mean(y)

m = (m – mean(x*y))

m = (m/((mean(x))**2 – mean(x**2)))

return m

mean(xs)*mean(xs) will cause the mean(xs) to be evaluated twice. Use mean(xs)**2 as its more efficient.

Would you please point out where is the mistake on the stock prediction that you said was fixed (the dates are not in the future but it predicts or take the prices of the existing dates).

Kind regards.

I know how to get coefficients in simple linear regression but I do not know how to get the coefficients or weights without using sklearn or other libraries! I cant find resources too! How can I write my own functions to do this?

hey guys I am pretty new to coding. But I wrote this out myself before he did to see how i would do. I did it like this:

def best_fit_slope (xs,ys):

x_bar = np.mean(xs)

y_bar = np.mean(ys)

xy_bar = np.mean(xs*ys)

m = (((x_bar*y_bar)-(xy_bar))/(

(x_bar**2)-(mean(xs**2))))

return m

So I split up the means into their own variables then plugged them into the equation so it is easier to read. is there a reason this is bad? like is this slower than having it all in a mess of one line?

it's very difficult for the person who knows python but doesn't know how to use few of these functions, And its kind of too fast.

could you please send me the code links where i can download the code and perform practical tasks.

How to choose best significant contribution variable among all the variable for our best fit? like in R we have Probability for Each variable significant codes..

I would really like to know where that formula comes from.

Could you point me out to where I could find an explanation please?

I like your tutorials but kinda feel you wasted a lot of time on this one explaining how to manage parentheses and stuff, while you rushed through much more complicated things in the previous ones.