Regularization is defined as…
The regularization with the square of an L2 distance may improve the results compared to OLS when the number of features is higher than the number of observations.
The L1 norm always yields shorter distances compared to the Euclidean norm.
Typically, the regularization is achieved by:
A regularization method that facilitates variable selection (estimating some coefficients as zero) is:
Write your own Python code to import the Boston housing data set and to scale the data (not the target) by z-scores. If we use all the features with the Linear Regression to predict the target variable then the root mean squared error (RMSE) is: 4.6791.
def standardize(x):
"""Standardize the original data set."""
return (x - x.mean(axis=0))/ x.std(axis=0)
data = load_boston()
X =
X_names = data.feature_names
y =
# data = load_boston()
# df = pd.DataFrame(
# y =
# X = standardize(df.values)
lin_reg = LinearRegression(),y)
y_pred = lin_reg.predict(X)
idxLo = y<50
y_lo = y[idxLo]
X_lo = X[idxLo]
lin_reg_lo = LinearRegression(), y_lo)
lin_reg_lo.score(X_lo, y_lo)
prices = y
predicted = y_pred
summation = 0
n = len(predicted)
for i in range (0,n):
difference = prices[i] - predicted[i]
squared_difference = difference**2
summation = summation + squared_difference
rmse = np.sqrt(metrics.mean_squared_error(prices, predicted))
MSE = summation/n
print ("The rmse is: ", rmse)
On the Boston housing data set if we consider the Lasso model with ‘alpha=0.03’ then the 10-fold cross-validated prediction error is: 4.8370.
kf = KFold(n_splits=10, random_state=1234,shuffle=True)
data = load_boston()
df = pd.DataFrame(, columns=data.feature_names)
y =
PE = []
lm = Lasso(alpha=0.03)
for train_index, test_index in kf.split(df):
X_train = df.values[train_index]
y_train = y[train_index]
X_test = df.values[test_index]
y_test = y[test_index]
model =, y_train)
y_pred = lm.predict(X_test)
PE.append(np.sqrt(metrics.mean_squared_error(y_test, y_pred)))
# print('RMSE from each fold:',np.sqrt(MSE(y_test, y_pred)))
On the Boston housing data set if we consider the Elastic Net model with ‘alpha=0.05’ and ‘l1_ratio=0.9’ then the 10-fold cross-validated prediction error is: 4.8965.
kf = KFold(n_splits=10, random_state=1234,shuffle=True)
data = load_boston()
df = pd.DataFrame(, columns=data.feature_names)
y =
PE = []
lm = ElasticNet(alpha=0.05, l1_ratio=0.9)
for train_index, test_index in kf.split(df):
X_train = df.values[train_index]
y_train = y[train_index]
X_test = df.values[test_index]
y_test = y[test_index]
model =, y_train)
y_pred = lm.predict(X_test)
PE.append(np.sqrt(metrics.mean_squared_error(y_test, y_pred)))
If we create all quadratic polynomial (degree=2) features based on the z-scores of the original features and then apply OLS, the root mean squared error is: 2.4486.
I initially got the wrong answer because I applied OLS incorrectly.
data = load_boston()
df = pd.DataFrame(
y =
x = standardize(df.values)
kf = KFold(n_splits=10, shuffle=True, random_state=1234)
model = LinearRegression()
polynomial_features= PolynomialFeatures(degree=2)
PE = []
for idxtrain, idxtest in kf.split(x):
x_train = x[idxtrain, :]
x_test = x[idxtest, :]
y_train = y[idxtrain]
y_test = y[idxtest]
x_poly_train = polynomial_features.fit_transform(x_train)
x_poly_test = polynomial_features.fit_transform(x_test),y_train)
yhat_train = model.predict(x_poly_train)
yhat_test = model.predict(x_poly_test)
PE.append(np.sqrt(metrics.mean_squared_error(y_test, yhat_test)))
If we create all quadratic polynomial (degree=2) features based on the z-scores of the original features and then apply the Ridge regression with alpha=0.1 and we create a Quantile-Quantile plot for the residuals then the result shows that the obtained residuals pretty much follow a normal distribution.
I intially said the answer was false, but my error was saying that the data didn’t completely follow a normal distribution. In general, the data does, my assessment was too strict.