Let’s see what happens to entropy when we make our first decision on the basis of Outlook. Now, we have four features to make decision and they are: So, entropy of whole system before we make our firest question is 0.940 = -(Probability of playing tennis) * log2(Probability of playing tennis) – (Probability of not playing tennis) * log2(Probability of not playing tennis) Now, we will see probability of not playing tennis.Įntropy at source= -(probability(a) * log2(probability(a))) – (probability(b) * log2(probability(b))) Probability = (Number of favourable events) / (Number of total events) In the given 14 days, we played tennis on 9 occasions and we did not play on 5 occasions. We have only two outcomes :Įither we played tennis or we didn’t play. Please refer to the play tennis dataset that is pasted above. Why?īecause making our decision on the basis of outlook reduced our randomness in the outcome(which is whether to play or not), more than what it would have been reduced in case of humidity or wind. We could have our first decision based on humidity or wind but we chose outlook. We decided to break the first decision on the basis of outlook. You can find more descriptive explanation here. Of course this formulae can be generalised for n discreet outcome as follow:Įntropy = -p(1)*log2(p(1)) -p(2)*log2(p(2))-p(3)*log2(p(3))……………………….p(n)*log(2p(n))Įntropy is an important concept. Where probability(a) is probability of getting head and probability(b) is probability of getting tail. We will take a moment here to give entropy in case of binary event(like the coin toss, where output can be either of the two events, head or tail) a mathematical face:Įntropy = -(probability(a) * log2(probability(a))) – (probability(b) * log2(probability(b))) In other words, its a measure of unpredictability. ID3 was invented by Ross Quinlan.īefore we deep down further, we will discuss some key concepts: EntropyĮntropy is a measure of randomness. The training data might be missing or have errorĪlthough there are various decision tree learning algorithms, we will explore the Iterative Dichotomiser 3 or commonly known as ID3.Here, the target function is – should you play tennis? And the output to this discreet output – Yes and No The learning data has attribute value pair like in the example shown above: Wind as an attribute has two possible values – strong or weak.Where should you use decision tree?Īt any scenario where learning data has following traits: If the outlook is sunny and humidity is normal, then yes, you may play tennis. A classic famous example where decision tree is used is known as Play Tennis. You might have seen many online games which asks several question and lead to something that you would have thought at the end. To imagine, think of decision tree as if or else rules where each if-else condition leads to certain answer at the end. Decision Tree learning is used to approximate discrete valued target functions, in which the learned function is approximated by Decision Tree.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |