Model Accuracy

The crop yield model produced the best results with XGBoost leveraging auto-correlation (given the strong correlation with prior years), as well as a rolling window (for additional observations), to get an R2 of 0.71 and RMSE of 8.9.

Also of note, was the non-parametric nature of remote sensing data that led other modeling techniques like Support Vector Machine (SVM) and regression methods to not do as well.

The price model produced the best results using a Huber regression leveraging multiple economic indexes and prior year correlations to get an R2 of 0.89 and RMSE of 0.015.

Our use of the Huber loss function was motivated by the fact that the data collected by the USDA was survey based and at times had outliers. This also likely caused other modeling tecniques to not perform as well.