# Estimation of Node Impurity: Gini Measure and Lean Six Sigma

(22)
Depending on the business environment or studying disciplines as neurology, networking, telecommunication or distributed systems the meaning of node differs, but generally it represents a point where more lines intersect or branch out. Nodes belong to trees and a tree can be defined locally as a collection of nodes (starting at a root node), where each node is a data structure consisting of a value, together with a list of nodes named "children" with the constraints that no node is duplicated. A tree can be defined abstractly as a whole (globally) as an ordered tree, with a value assigned to each node. Both these perspectives are useful: while a tree can be analyzed mathematically as a whole, when actually represented as a data structure it is usually represented and worked with separately by node (rather than as a list of nodes and an adjacency list of edges between nodes, as one may represent a digraph, for instance). For example, looking at a tree as a whole, one can talk about "the parent node" of a given node, but in general as a data structure a given node only contains the list of its children, but does not contain a reference to its parent (if any).http://en.wikipedia.org/wiki/Tree_%28data_structure%29

Let's consider company A as a tree has many suppliers that constitute nodes and each supply different products or services. The estimation of node purity known as Gini, measures the extent of purity for a region containing data points from possibly different classes. The main idea is that no "child" will do a repetitive work therefore we will determine the impurity node. The Gini measure will help the company A to decide on how many nodes will be kept as impure or suppliers that will provide similar services/products or that have repetitive work which can be reduced.
A pure node has deviance 0; otherwise, the deviance is positive. Where the sum is over the classes "i" at the node, and p(i) is the observed fraction of classes with class "i" that reach the node. A node with just one class (a pure node) has Gini index 0; otherwise the Gini index is positive. If we apply in practice, a pure node would not have any repetitive work or service and each would be totally different than each other.
The question is how feasible is such method in Lean Six Sigma environment among suppliers. An option is that there is HR that maintains all the roles within the tree and its children. Another function would be Engineering, Quality etc. but each would become separated and independent from each other. Another possibility is that within each "child" there is no repetitive work.
http://www.mathworks.com/help/stats/compactclassificationensemble.predictorimportance.html