*In Chapter 3, we talked about Linear Regression: how we can assume the data fits a linear model and predict using that linear model. The linear model falls short when there are a lot of features. It is good practice to collect as many data points as possible but the drawback would be that some data will not relevant to the target and it will slow down the training and might even be detrimental to the accuracy of the model. So what should we do? This chapter talks about methods to beef up a linear model so that the model…*

*Now that we learned some basic models, it is time to determine how well our model performs on new data that the model has not seen before (model assessment) and how to choose the right model (model selection). The tool that is used in statistics to assess and select models is resampling.*

Resampling (wait for it…) is the process of fitting a model multiple times using different subsets of the training data. For each subset, we fit a model on it to obtain additional information about the fitted model. Using the new information obtained, we can tweak the model to…

*Last blog we talked about linear regression: given some data, predict a numerical response. Chapter 4 and this blog goes over the scenario when the response variable is a not a numerical value but a class. This type of machine learning is called classification.*

The example that ISLR uses is: given people’s loan data, predict whether they will d*efault *or *not default. *Visually the data will look like the orange lines in Figure 1. If we apply a linear function on this type of data, then it does not do a good job fitting the data (graph on the left)…

*BMP180 is a great way to start learning how to integrate sensors with DuckLinks. The BMP180 are these really small sensors that collect temperature and pressure. It requires three steps: **Soldering the BMP180** to a board and** Uploading the firmware**, and **Collecting the data**.*

*On Sunday March 14, we left Brooklyn heading south to Allentown, PA. Our goal was to test the Version 2 (V2) of the ClusterDuck Protocol (CDP) which was released in Jan 2021 on a larger scale. Previously we were limited to testing the V2 in our living rooms (thanks COVID), but now we had the opportunity to do it in a suburban area. The metric we were focusing on this deployment was packet loss: How many packets were not received from the ClusterDuck Network? Below is our analysis from the data collected using V2.*

*In this blog we will walk through one of the questions in R from Chapter 3- Linear Regression from ISLR. This complements the blog we wrote for this chapter (**Part I**, **Part II**).*

**This question involves the use of multiple linear regression on the Auto data set.**

**(a) Produce a scatterplot matrix which includes all of the variables in the data set.**

**(b) Compute the matrix of correlations between the variables using the function cor(). You will need to exclude the name variable, which is qualitative.**

**(c) Use the lm() function to perform a multiple linear regression with mpg as…**

*In the previous blog, we talked about **Simple Linear Regression (SLR)**, predicting the response using only one predictor. But in the real world, we do not have just one one variable, but instead we have multiple variables. In these common situations we apply Multiple Linear Regression (MLR).*

The MLR assumption is the same as SLR: it assumes that data can be represented using a linear form. The only difference in MLR is that there is just more predictors to consider.

*Chapter 3 talks about Linear Regression. This chapter is a big chapter because it introduces a lot of terms. The best way to talk about this chapter and absorb as much information, is to divide this chapter into two blogs: Simple Linear Regression and **Multiple Linear Regression**.*

Simple Linear Regression, as it states is a *simple* linear regression. That means we are only using one variable for X, to predict the response variable Y. The function estimate for linear regression is to predict Y is (drumroll…) linear model. …

*We are reading Introduction to Statistical Learning (ISLR) bookclub-style format here at Biased Outliers. The purpose of this reading is to build our foundation of statistics so that we can move forward as a community to understand more complex machine learning topics. Every week we meet online on Sunday evenings to discuss what we read that week and tackle both the conceptual and applied exercises. Another way to reinforce what we learned is by blogging what we learned in each chapter.*

Using data, we want to understand how a variable affects the other. For example, if there are clouds in…

I just finished reading **Educated **by Tara Westover and it definitely left an impression on me. Tara grew up in a family where they did not value education because they believed it was the way government brainwashed people. “College is extra school for people too dumb to learn the first time around,” is what Tara’s father said when she asked him what college was. Instead he believed it was better to scrap metal for a living and not rely on any government establishments (including hospitals, schools, medicine, etc).

Reading this book made me reflect on how I was raised. When…

Avid learner.