Tackle only if you have time – This is a harder question and…
Tackle only if you have time – This is a harder question and I am giving it intentionally lower point count (only 2 points) – all ingredients to answer were presented during lecture but you need to connect the dots (the full-fledged argument was not presented explicitly in the lecture). Based on the lectures and assigned readings/videos, explain why k-fold cross-validation is a more robust method for deciding the optimal number of trees (an example of a hyperparameter) in a random forest compared to just executing a single random partitioning of the data into a training and test set, and looking at the OOB (out-of-bag) error plot based on that split. Answer in no more than 5 sentences. *** Hint: Think about, on one hand, how k-fold cross-validation works and, on the other hand, how we compute OOB error and create the plot of that OOB error relative to the number of trees in the forest.
Read Details