Training: 0.019 Testing: -0.002
Number of splits: 20
Interpretation: I assigned 10 splits initially, but then played around with different numbers of splits. I noticed that around 20 splits the r squared value got the closest to 1. The numbers from 20 splits are listed above. Given the training scores, testing scores, and external r squared value (testing score), I’m assessing the model created as poorly fit. If the r squared value were higher then I would say the model was better fit.
Training: 0.019 Testing: 0.003
Number of splits: 10
Interpretation: I approached this problem the same way I approached question 1. I had initially assigned a value of 5 for the number of splits. After playing around with the number a little, I found that the best number of splits was again 10. It makes sense that the number of splits wouldn’t change from an OLS model without standardized data to and OLS model based off of standardized data, because the number of data points was the same just with different values. This model is also poorly fit given the values listed above. The value I place the most weight on when determining if the model is fit well is the external validity (testing score), which is far from 1.
Training score for this value: 0.019 Testing score for this value: 0.019
Number of splits: 10
Interpretation: For this question I standardized the data, and reached a similar conclusion to the two previous problems. The testing and training scores are close, which is a good sign, but the r2 value for the testing data is still really low. This leads me to conclude that this model is still not fit correctly.
Interpretation: I found that after training and testing the models on the actual prices in the boston housing data set the models still performed poorly. My two main criticisms are the same: the training score and testing score aren’t consistently close enough together and the r2 of the testing data is still too low to consider the model to be well fit.
Nonstandardized OLS
Training: 0.004 Testing: -0.062
Standardized OLS
Training: 0.004 Testing: -0.017
Ridge Regression
Training score for this value: 0.004 Testing score for this value: -0.015
Nonstandardized OLS - 0.246
Standardized OLS - 0.230
Ridge Regression - 0.273
Interpretation: The model with the highest r2 score is the ridge regression model, then the nonstandardized OLS, then the standardized OLS. On one hand, I could say that this means that the ridge regression model is the best fit model out of all three and therefore the one to use, but I think what’s more realistic is that just using the r2 value to compare models isn’t the best way to go. The problem with comparing r2 values from one model to the next is that in one of these models the variables were transformed. This could be throwing off the r2 value and then comparing it to other values wouldn’t be accurate. Another thing to consider is the entire plot of testing scores vs training scores. The r2 value of a snapshot in the data isn’t the best thing to interpret. It would be more accurate to plot the r2 data, thereby determining which of the models truly fits the best given all data points, and not just one.
My assessment is that the model with the best results, in this case the ridge regression, is neither only overfit nor underfit; it’s likely both. The reason the r2 value alone doesn’t reflect that is because having a single r2 value is having only a snapshot of how the data appears to be behaving. In reality, the best way to see how each model is behaving is to plot the training data along the validation data (r2). In the event that the training data and r2 validation data are similar, the model would fit. If there’s high training and high testing error the model would be overfit, and if there’s low training error with high testing error then the model would be underfit. The data mentioned in previous questions suggests that the model is underfit in areas, and overfit in other areas. My suggestions to zillow would be the following: