관리 메뉴

Jun Hyuk Kim's Blog

[Terms] Bagging 본문

Coding Journal/AI

[Terms] Bagging

junhyuk1229 2023. 5. 21. 11:25

Bagging

Bagging(Bootstrapping aggregating) is a ensemble method used to reduce variance.

 

When there are low amount of data to use to create a distribution we might get the wrong distribution from the data results.

Ex: If we want to know the general answer to a survey for the whole population of a city

 

We can…

1.       Ask the question to all the people in the population.

2.       Ask the question to some people and guess the answer distribution

 

For 1 it is easy if it has a small population but if the population is a whole city or country we will have problems when trying to get all the answers.

 

Using 2 as the method we will use less time and will be more reasonable, but we might not get the full picture of the actual distribution.

 

Lets say we use method 2 to get the distribution. We can get the distribution of the sample data and guess the shape of the distribution from it. The other way is to get the distribution from the sample data using random sampling. This is because can trust the data we are given and that randomly sampling the data will ultimately have a high chance of having the same shape as the whole data.

 

The bootstrapping method can be best used when we have a small amount of samples or when we have a distribution that is not regular.

Method

  • Get smaller datasets using the original dataset
    • The same data can be chosen multiple times.
    • All smaller datasets must have the same data length
  • Each smaller dataset is used to train a model
  • The result is made by combining all predictions from the models
    • The predictions can be combined using mean or mode

Results

Because the data is the same and has the same chance of being used, the bias won’t be affected much. Instead, the variance will be decreased by using this method. Lowering the variance ultimately can be used to prevent overfitting

Useful links

https://www.ibm.com/topics/bagging

https://stats.stackexchange.com/questions/26088/explaining-to-laypeople-why-bootstrapping-works

'Coding Journal > AI' 카테고리의 다른 글

Stochastic Gradient Descent and Batch Sizes  (0) 2023.05.26
[Terms] Boosting  (0) 2023.05.22
[Coding] nn.Linear  (0) 2023.05.20
[Thought] A lower learn rate might cause underfitting  (0) 2023.05.20
[Terms] Bias and Variance Tradeoff  (0) 2023.05.20