How does Node splits in Decision Tree for Classification and Regression

๐Ÿง  Part 1: Decision Tree โ€“ Classification

๐Ÿ” How does a node split happen?

In classification trees, node splitting is done based on purity. The goal is to reduce impurity (like Gini or Entropy) as much as possible at each split.

๐Ÿ”ข Gini Impurity Formula:

Gini=1โˆ’โˆ‘i=1Cpi2Gini = 1 – \sum_{i=1}^{C} p_i^2Gini=1โˆ’i=1โˆ‘Cโ€‹pi2โ€‹

Where pip_ipiโ€‹ is the proportion of class iii in the node.


๐Ÿ“Š Sample Data (Classification)

IDFeature (X)Class (Y)
12A
23A
310B
419B
520B

We will try to split based on X.


๐Ÿ” Try a Split: X <= 5

  • Left node: X โ‰ค 5 โ†’ IDs 1, 2 โ†’ [A, A]
  • Right node: X > 5 โ†’ IDs 3, 4, 5 โ†’ [B, B, B]

image 3

โœ… Best Split: This gives 0 total impurity, so it’s the best possible split.


๐Ÿ” Try another Split: X <= 15

  • Left node: X โ‰ค 15 โ†’ IDs 1, 2, 3 โ†’ [A, A, B]
  • Right node: X > 15 โ†’ IDs 4, 5 โ†’ [B, B]

image 4

โš ๏ธ Higher than 0 โ†’ worse than first split.


๐Ÿ“ˆ Part 2: Decision Tree โ€“ Regression

๐Ÿ” How does a node split happen?

In regression trees, splitting is done to minimize variance (or MSE โ€“ Mean Squared Error) of the target variable.


๐Ÿ“Š Sample Data (Regression)

IDFeature (X)Target (Y)
125
236
31013
41925
52026

Try a Split: X <= 5

  • Left Node (X โ‰ค 5): IDs 1, 2 โ†’ [5, 6]
  • Right Node (X > 5): IDs 3, 4, 5 โ†’ [13, 25, 26]

image 5

Try a Better Split: X <= 10

  • Left Node: IDs 1, 2, 3 โ†’ [5, 6, 13]
  • Right Node: IDs 4, 5 โ†’ [25, 26]

image 6

โœ… Much lower than 21.06 โ€“ better split.


๐Ÿงพ Summary:

Tree TypeSplit CriteriaObjective
ClassificationGini Impurity (or Entropy)Maximize class separation (purity)
RegressionMSE (or MAE, Variance)Minimize variance (error)