data-310-page


Project maintained by ecwydra Hosted on GitHub Pages — Theme by mattgraham

Lab 3

Question 1

An “ordinary least squares” (or OLS) model seeks to minimize the differences between your true and estimated dependent variable.

Question 2

Do you agree or disagree with the following statement:

In a linear regression model, all feature must correlate with the noise in order to obtain a good fit.

Reasoning: I originally answered that I agreed with this statement because if the features don’t correlate with noise, or outliers, then the model would get thrown off by outlier data and the fit wouldn’t be as accurate.

Question 3

  1. 6.6523
  2. 7.0123
  3. 9.6312
  4. 8.3244 – This is the answer!
data = pd.read_csv('L3Data.csv')
data.head(3)

# data frame to matrix
feature_df = data[['days online', 'views', 'contributions', 'answers']]
target_df = data[['Grade']]

feature_matrix = feature_df.values
target_matrix = target_df.values

# separate into train and test
X_train, X_test, y_train, y_test = train_test_split(feature_matrix, target_matrix, test_size=0.25, random_state=1234)

# build multiple linear regression
scale = StandardScaler()
Xs_train = scale.fit_transform(X_train)
Xs_test  = scale.transform(X_test)
model = LinearRegression()
model.fit(Xs_train, y_train)
yhat = model.predict(Xs_test)

print(f'the mse for the test data is {np.sqrt(MSE(y_test, yhat))}')

Question 4

In practice we determine the weights for linear regression with the “X_test” data.

Question 5

Polynomial regression is best suited for functional relationships that are non-linear in weights.

Question 6

Linear regression, multiple linear regression, and polynomial regression can be all fit using LinearRegression() from the sklearn.linear_model module in Python.

Question 7

  1. 22
  2. 23 – This is the answer
  3. 25
  4. 24
# data frame
df = pd.read_csv('L3Data.csv')
df.head(3)

# features and targets saved
X = df[['days online', 'views', 'contributions', 'answers']]
y = df['Grade']

# separate into train and test
X_train, X_test, y_train, y_test = train_test_split(feature_matrix, target_matrix, test_size=0.25, random_state=1234)

X_train.shape

Question 8

The gradient descent method does not need any hyper parameters.

Question 9

To create and display a figure using matplotlib.pyplot that has visual elements (scatterplot, labeling of the axes, display of grid), in what order would the below code need to be executed?

  1. import matplotlib.pyplot as plt

  2. fig, ax = plt.subplots()

  3. ax.scatter(X_test, y_test, color=”black”, label=”Truth”) ax.scatter(X_test, lin_reg.predict(X_test), color=”green”, label=”Linear”) ax.set_xlabel(“Discussion Contributions”) ax.set_ylabel(“Grade”)

  4. ax.grid(b=True,which=’major’, color =’grey’, linestyle=’-‘, alpha=0.8) ax.grid(b=True,which=’minor’, color =’grey’, linestyle=’–’, alpha=0.2) ax.minorticks_on()

Question 10

Which of the following is nonlinear?

The one that has weights the aren’t just beta, there are exponents and stuff attached to the weights.