๐ง Part 1: Decision Tree โ Classification
๐ How does a node split happen?
In classification trees, node splitting is done based on purity. The goal is to reduce impurity (like Gini or Entropy) as much as possible at each split.
๐ข Gini Impurity Formula:
Gini=1โโi=1Cpi2Gini = 1 – \sum_{i=1}^{C} p_i^2Gini=1โi=1โCโpi2โ
Where pip_ipiโ is the proportion of class iii in the node.
๐ Sample Data (Classification)
ID | Feature (X) | Class (Y) |
---|---|---|
1 | 2 | A |
2 | 3 | A |
3 | 10 | B |
4 | 19 | B |
5 | 20 | B |
We will try to split based on X
.
๐ Try a Split: X <= 5
- Left node: X โค 5 โ IDs 1, 2 โ [A, A]
- Right node: X > 5 โ IDs 3, 4, 5 โ [B, B, B]

โ Best Split: This gives 0 total impurity, so it’s the best possible split.
๐ Try another Split: X <= 15
- Left node: X โค 15 โ IDs 1, 2, 3 โ [A, A, B]
- Right node: X > 15 โ IDs 4, 5 โ [B, B]

โ ๏ธ Higher than 0 โ worse than first split.
๐ Part 2: Decision Tree โ Regression
๐ How does a node split happen?
In regression trees, splitting is done to minimize variance (or MSE โ Mean Squared Error) of the target variable.
๐ Sample Data (Regression)
ID | Feature (X) | Target (Y) |
---|---|---|
1 | 2 | 5 |
2 | 3 | 6 |
3 | 10 | 13 |
4 | 19 | 25 |
5 | 20 | 26 |
Try a Split: X <= 5
- Left Node (X โค 5): IDs 1, 2 โ [5, 6]
- Right Node (X > 5): IDs 3, 4, 5 โ [13, 25, 26]

Try a Better Split: X <= 10
- Left Node: IDs 1, 2, 3 โ [5, 6, 13]
- Right Node: IDs 4, 5 โ [25, 26]

โ Much lower than 21.06 โ better split.
๐งพ Summary:
Tree Type | Split Criteria | Objective |
---|---|---|
Classification | Gini Impurity (or Entropy) | Maximize class separation (purity) |
Regression | MSE (or MAE, Variance) | Minimize variance (error) |