Hello students, welcome to today's mathematics lesson. I am so happy to see you all here, ready to learn something new and interesting. Today, we are going to study Chapter 13, which is about Statistics. Now, I know some of you might think statistics is all about boring numbers and tables, but I promise you, by the end of this lesson, you will see how useful and practical this chapter really is. In fact, you use statistics every day without even realizing it - when you talk about the average marks in your class, or when you say "most students like pizza," or when you find out the middle score in a test. All of these are part of statistics. So, let's begin our journey into the world of statistics.
In Class IX, you have already studied about classification of data into ungrouped and grouped frequency distributions. You also learned to represent data pictorially using bar graphs, histograms, and frequency polygons. And most importantly, you studied three important numerical representatives of data, also called measures of central tendency - mean, median, and mode. These three measures help us understand the central or typical value in a set of data.
In this chapter, we are going to extend our study of mean, median, and mode from ungrouped data to grouped data. We will also learn about something called cumulative frequency and how to draw cumulative frequency curves, which are called ogives. These are powerful tools that help us analyze large sets of data in a meaningful way.
So, let's start with the mean of grouped data. Now, all of you remember what the mean or average is, right? It is the sum of all observations divided by the total number of observations. If we have observations x₁, x₂, ..., xₙ with respective frequencies f₁, f₂, ..., fₙ, then this means observation x₁ occurs f₁ times, x₂ occurs f₂ times, and so on. The sum of all observations would be f₁x₁ + f₂x₂ + ... + fₙxₙ, and the total number of observations would be f₁ + f₂ + ... + fₙ. So, the mean x̄ is given by the formula x̄ = (f₁x₁ + f₂x₂ + ... + fₙxₙ)/(f₁ + f₂ + ... + fₙ). We can write this more compactly using the Greek letter Σ (sigma), which means summation. So, x̄ = (∑fᵢxᵢ)/(∑fᵢ), where i goes from 1 to n.
Now, let's apply this formula to find the mean in an example. Consider this: the marks obtained by 30 students of Class X in a Mathematics paper out of 100 marks are given in the table below. We have marks and the number of students who got each mark. Let me write this out clearly for you.
The marks obtained are: 10, 20, 36, 40, 50, 56, 60, 70, 72, 80, 88, 92, and 95. The corresponding number of students who got each of these marks are: 1, 1, 3, 4, 3, 2, 4, 4, 1, 1, 2, 3, and 1 respectively.
To find the mean, we need to multiply each mark by the number of students who got that mark, add all these products, and then divide by the total number of students. Let me show you the calculation:
For marks 10 with 1 student: 10 × 1 = 10 For marks 20 with 1 student: 20 × 1 = 20 For marks 36 with 3 students: 36 × 3 = 108 For marks 40 with 4 students: 40 × 4 = 160 For marks 50 with 3 students: 50 × 3 = 150 For marks 56 with 2 students: 56 × 2 = 112 For marks 60 with 4 students: 60 × 4 = 240 For marks 70 with 4 students: 70 × 4 = 280 For marks 72 with 1 student: 72 × 1 = 72 For marks 80 with 1 student: 80 × 1 = 80 For marks 88 with 2 students: 88 × 2 = 176 For marks 92 with 3 students: 92 × 3 = 276 For marks 95 with 1 student: 95 × 1 = 95
Now, let's add all these products: 10 + 20 + 108 + 160 + 150 + 112 + 240 + 280 + 72 + 80 + 176 + 276 + 95 = 1779. And the total number of students is 1+1+3+4+3+2+4+4+1+1+2+3+1 = 30. So, the mean is 1779 divided by 30, which equals 59.3. So, on average, these students scored 59.3 marks. This is called the direct method of finding the mean.
Now, here's an important point. In most real-life situations, data is usually so large that we need to condense it into grouped data to make it meaningful. So, we need to convert ungrouped data into grouped data and find a method to calculate its mean.
Let me show you how to do this. Let's take the same data from our example and convert it into grouped data by forming class intervals of width 15. We need to be careful while allocating frequencies to each class interval. The convention is that students falling on any upper class-limit would be considered in the next class. For example, 4 students who obtained 40 marks would be considered in the class interval 40-55 and not in 25-40.
So, let's form our grouped frequency distribution table. We have class intervals: 10-25, 25-40, 40-55, 55-70, 70-85, and 85-100. Now, let's count how many students fall in each interval.
For 10-25: students with marks 10 and 20, that's 1+1 = 2 students. For 25-40: students with marks 36, that's 3 students. For 40-55: students with marks 40 and 50, that's 4+3 = 7 students. For 55-70: students with marks 56 and 60, that's 2+4 = 6 students. For 70-85: students with marks 70, 72, and 80, that's 4+1+1 = 6 students. For 85-100: students with marks 88, 92, and 95, that's 2+3+1 = 6 students.
So, our grouped frequency distribution table looks like this:
Class interval: 10-25, Number of students: 2 Class interval: 25-40, Number of students: 3 Class interval: 40-55, Number of students: 7 Class interval: 55-70, Number of students: 6 Class interval: 70-85, Number of students: 6 Class interval: 85-100, Number of students: 6
Now, for each class interval, we need a point that represents the whole class. We assume that the frequency of each class interval is centered around its mid-point. This mid-point is also called the class mark. We find the class mark by taking the average of the upper and lower limits of the class. So, class mark = (Upper limit + Lower limit)/2.
For the class 10-25, the class mark is (10 + 25)/2 = 17.5. Similarly, for 25-40, it's (25+40)/2 = 32.5. For 40-55, it's 47.5. For 55-70, it's 62.5. For 70-85, it's 77.5. And for 85-100, it's 92.5.
Now, we can treat these class marks as our xᵢ values and proceed to compute the mean just like we did before. Let's calculate:
For class 10-25: class mark 17.5, frequency 2, so fᵢxᵢ = 17.5 × 2 = 35.0 For class 25-40: class mark 32.5, frequency 3, so fᵢxᵢ = 32.5 × 3 = 97.5 For class 40-55: class mark 47.5, frequency 7, so fᵢxᵢ = 47.5 × 7 = 332.5 For class 55-70: class mark 62.5, frequency 6, so fᵢxᵢ = 62.5 × 6 = 375.0 For class 70-85: class mark 77.5, frequency 6, so fᵢxᵢ = 77.5 × 6 = 465.0 For class 85-100: class mark 92.5, frequency 6, so fᵢxᵢ = 92.5 × 6 = 555.0
Now, the sum of fᵢxᵢ is 35.0 + 97.5 + 332.5 + 375.0 + 465.0 + 555.0 = 1860.0. And the sum of frequencies is 2+3+7+6+6+6 = 30. So, the mean is 1860.0/30 = 62.
Wait, students, did you notice something interesting? The mean we got from the grouped data is 62, but the actual mean from the ungrouped data was 59.3. Why is there a difference? This is because when we use grouped data, we are making an assumption that all observations in a class are concentrated at the class mark. This is only an approximation, which is why we get an approximate mean of 62, while 59.3 is the exact mean. The difference arises from this mid-point assumption. In real-life situations, when we have large data, we often have to work with grouped data and accept this small approximation.
Now, here's where things get interesting. Sometimes, when the values of xᵢ and fᵢ are very large, finding the product fᵢxᵢ becomes very tedious and time-consuming. So, we need some methods to simplify these calculations. Let me introduce you to two such methods: the Assumed Mean Method and the Step-Deviation Method.
The idea behind the assumed mean method is simple. Instead of working with large numbers for xᵢ, we subtract a fixed number from each xᵢ to get smaller numbers. This fixed number is called the assumed mean, denoted by 'a'. We usually choose 'a' to be that xᵢ which lies in the center of all the xᵢ values. In our example, we can choose a = 47.5 or a = 62.5. Let's choose a = 47.5.
Now, we find the difference between each xᵢ and our assumed mean a. This difference is called the deviation, denoted by dᵢ. So, dᵢ = xᵢ - a = xᵢ - 47.5.
Let me calculate these deviations:
For class 10-25: xᵢ = 17.5, so dᵢ = 17.5 - 47.5 = -30 For class 25-40: xᵢ = 32.5, so dᵢ = 32.5 - 47.5 = -15 For class 40-55: xᵢ = 47.5, so dᵢ = 47.5 - 47.5 = 0 For class 55-70: xᵢ = 62.5, so dᵢ = 62.5 - 47.5 = 15 For class 70-85: xᵢ = 77.5, so dᵢ = 77.5 - 47.5 = 30 For class 85-100: xᵢ = 92.5, so dᵢ = 92.5 - 47.5 = 45
Now, we multiply each dᵢ by the corresponding frequency fᵢ and add them up:
For 10-25: fᵢ = 2, dᵢ = -30, so fᵢdᵢ = -60 For 25-40: fᵢ = 3, dᵢ = -15, so fᵢdᵢ = -45 For 40-55: fᵢ = 7, dᵢ = 0, so fᵢdᵢ = 0 For 55-70: fᵢ = 6, dᵢ = 15, so fᵢdᵢ = 90 For 70-85: fᵢ = 6, dᵢ = 30, so fᵢdᵢ = 180 For 85-100: fᵢ = 6, dᵢ = 45, so fᵢdᵢ = 270
The sum of fᵢdᵢ is -60 + (-45) + 0 + 90 + 180 + 270 = 435.
Now, let's understand the relationship between the mean of deviations d̄ and the actual mean x̄. Since we subtracted 'a' from each xᵢ to get dᵢ, we have dᵢ = xᵢ - a. So, the mean of deviations d̄ = Σfᵢdᵢ/Σfᵢ. But we can write this as:
d̄ = Σfᵢ(xᵢ - a)/Σfᵢ = (Σfᵢxᵢ/Σfᵢ) - (aΣfᵢ/Σfᵢ) = x̄ - a
So, x̄ = a + d̄, or more specifically, x̄ = a + (Σfᵢdᵢ/Σfᵢ).
Now, substituting our values: a = 47.5, Σfᵢdᵢ = 435, and Σfᵢ = 30, we get:
x̄ = 47.5 + 435/30 = 47.5 + 14.5 = 62.
So, the mean is 62, which is the same as we got using the direct method! This is called the Assumed Mean Method. The beauty of this method is that it doesn't matter which value we choose as 'a' - we will always get the same answer. Let me tell you why. If we had chosen a different value for 'a', say 62.5, the deviations would be different, but when we add them up and divide, we would still get the same mean. This is because the mean of deviations always adjusts to give us the correct answer.
Now, let's look at our deviations in column 4. Do you notice something special? All the deviations are multiples of 15! This is because our class size is 15. So, if we divide each deviation by 15, we get even smaller numbers to work with. This leads us to the Step-Deviation Method.
In the step-deviation method, we define uᵢ = (xᵢ - a)/h, where h is the class size. In our case, h = 15. So, let's calculate uᵢ:
For class 10-25: dᵢ = -30, so uᵢ = -30/15 = -2 For class 25-40: dᵢ = -15, so uᵢ = -15/15 = -1 For class 40-55: dᵢ = 0, so uᵢ = 0/15 = 0 For class 55-70: dᵢ = 15, so uᵢ = 15/15 = 1 For class 70-85: dᵢ = 30, so uᵢ = 30/15 = 2 For class 85-100: dᵢ = 45, so uᵢ = 45/15 = 3
Now, we multiply each uᵢ by the corresponding frequency fᵢ:
For 10-25: fᵢ = 2, uᵢ = -2, so fᵢuᵢ = -4 For 25-40: fᵢ = 3, uᵢ = -1, so fᵢuᵢ = -3 For 40-55: fᵢ = 7, uᵢ = 0, so fᵢuᵢ = 0 For 55-70: fᵢ = 6, uᵢ = 1, so fᵢuᵢ = 6 For 70-85: fᵢ = 6, uᵢ = 2, so fᵢuᵢ = 12 For 85-100: fᵢ = 6, uᵢ = 3, so fᵢuᵢ = 18
The sum of fᵢuᵢ is -4 + (-3) + 0 + 6 + 12 + 18 = 29.
Now, let's find the relationship between ū (the mean of uᵢ values) and x̄. We have uᵢ = (xᵢ - a)/h. So:
ū = Σfᵢuᵢ/Σfᵢ = (1/h) × (Σfᵢxᵢ/Σfᵢ - a) = (x̄ - a)/h
So, hū = x̄ - a, which means x̄ = a + hū, or more specifically, x̄ = a + h(Σfᵢuᵢ/Σfᵢ).
Now, substituting our values: a = 47.5, h = 15, Σfᵢuᵢ = 29, and Σfᵢ = 30, we get:
x̄ = 47.5 + 15 × (29/30) = 47.5 + 14.5 = 62.
Again, we get the same answer! This is the Step-Deviation Method. This method is especially convenient when all the deviations dᵢ have a common factor, which is usually the case when class sizes are equal.
So, students, to summarize what we've learned about finding the mean of grouped data:
We have three methods: the Direct Method, the Assumed Mean Method, and the Step-Deviation Method. All three methods give the same result. The choice of method depends on the numerical values we're working with. If xᵢ and fᵢ are small numbers, we can use the direct method. If they are large numbers, we can use either the assumed mean method or the step-deviation method to simplify our calculations. The step-deviation method is particularly useful when the class sizes are equal and the deviations have a common factor.
Now, let's apply these methods to another example to make sure you understand them well.
Example 2: The table below gives the percentage distribution of female teachers in the primary schools of rural areas of various states and union territories of India. We need to find the mean percentage of female teachers by all three methods.
The class intervals are: 15-25, 25-35, 35-45, 45-55, 55-65, 65-75, and 75-85. The number of states/UTs in each class are: 6, 11, 7, 4, 4, 2, and 1 respectively.
First, let's find the class marks. For 15-25, it's (15+25)/2 = 20. For 25-35, it's 30. For 35-45, it's 40. For 45-55, it's 50. For 55-65, it's 60. For 65-75, it's 70. For 75-85, it's 80.
Now, let's use the direct method. We calculate fᵢxᵢ for each class:
For 15-25: fᵢ = 6, xᵢ = 20, so fᵢxᵢ = 120 For 25-35: fᵢ = 11, xᵢ = 30, so fᵢxᵢ = 330 For 35-45: fᵢ = 7, xᵢ = 40, so fᵢxᵢ = 280 For 45-55: fᵢ = 4, xᵢ = 50, so fᵢxᵢ = 200 For 55-65: fᵢ = 4, xᵢ = 60, so fᵢxᵢ = 240 For 65-75: fᵢ = 2, xᵢ = 70, so fᵢxᵢ = 140 For 75-85: fᵢ = 1, xᵢ = 80, so fᵢxᵢ = 80
The sum of fᵢxᵢ is 120+330+280+200+240+140+80 = 1390. The sum of frequencies is 6+11+7+4+4+2+1 = 35. So, the mean = 1390/35 = 39.71.
Now, let's use the assumed mean method. Let's take a = 50 (which is the class mark of the middle class 45-55). Then, dᵢ = xᵢ - 50:
For 15-25: dᵢ = 20 - 50 = -30 For 25-35: dᵢ = 30 - 50 = -20 For 35-45: dᵢ = 40 - 50 = -10 For 45-55: dᵢ = 50 - 50 = 0 For 55-65: dᵢ = 60 - 50 = 10 For 65-75: dᵢ = 70 - 50 = 20 For 75-85: dᵢ = 80 - 50 = 30
Now, let's calculate fᵢdᵢ:
For 15-25: fᵢ = 6, dᵢ = -30, so fᵢdᵢ = -180 For 25-35: fᵢ = 11, dᵢ = -20, so fᵢdᵢ = -220 For 35-45: fᵢ = 7, dᵢ = -10, so fᵢdᵢ = -70 For 45-55: fᵢ = 4, dᵢ = 0, so fᵢdᵢ = 0 For 55-65: fᵢ = 4, dᵢ = 10, so fᵢdᵢ = 40 For 65-75: fᵢ = 2, dᵢ = 20, so fᵢdᵢ = 40 For 75-85: fᵢ = 1, dᵢ = 30, so fᵢdᵢ = 30
The sum of fᵢdᵢ is -180 + (-220) + (-70) + 0 + 40 + 40 + 30 = -360. So, using the formula x̄ = a + (Σfᵢdᵢ/Σfᵢ), we get x̄ = 50 + (-360)/35 = 50 - 10.2857 = 39.71.
Now, let's use the step-deviation method. Here, h = 10 (the class size). So, uᵢ = (xᵢ - a)/h = (xᵢ - 50)/10:
For 15-25: uᵢ = -30/10 = -3 For 25-35: uᵢ = -20/10 = -2 For 35-45: uᵢ = -10/10 = -1 For 45-55: uᵢ = 0/10 = 0 For 55-65: uᵢ = 10/10 = 1 For 65-75: uᵢ = 20/10 = 2 For 75-85: uᵢ = 30/10 = 3
Now, let's calculate fᵢuᵢ:
For 15-25: fᵢuᵢ = 6 × (-3) = -18 For 25-35: fᵢuᵢ = 11 × (-2) = -22 For 35-45: fᵢuᵢ = 7 × (-1) = -7 For 45-55: fᵢuᵢ = 4 × 0 = 0 For 55-65: fᵢuᵢ = 4 × 1 = 4 For 65-75: fᵢuᵢ = 2 × 2 = 4 For 75-85: fᵢuᵢ = 1 × 3 = 3
The sum of fᵢuᵢ is -18 + (-22) + (-7) + 0 + 4 + 4 + 3 = -36. So, using the formula x̄ = a + h(Σfᵢuᵢ/Σfᵢ), we get x̄ = 50 + 10 × (-36/35) = 50 - 10.2857 = 39.71.
So, all three methods give us the same answer: 39.71. This means that on average, about 39.71% of teachers in primary schools of rural areas are female. This is a very useful piece of information for education policy makers.
Now, let's look at one more example where the class sizes are unequal. This is Example 3.
The distribution shows the number of wickets taken by bowlers in one-day cricket matches. We need to find the mean number of wickets. The class intervals are: 20-60, 60-100, 100-150, 150-250, 250-350, and 350-450. The number of bowlers in each class are: 7, 5, 16, 12, 2, and 3 respectively.
Notice that the class sizes are not equal here. The first class has size 40, the second has size 40, the third has size 50, the fourth has size 100, the fifth has size 100, and the sixth has size 100. Also, the xᵢ values are large. So, let's use the step-deviation method. But we need to be careful - we can still apply the step-deviation method even when class sizes are unequal, as long as we choose an appropriate h. Let's choose a = 200 (the class mark of the middle class 150-250) and h = 20.
First, let's find the class marks:
For 20-60: xᵢ = (20+60)/2 = 40 For 60-100: xᵢ = (60+100)/2 = 80 For 100-150: xᵢ = (100+150)/2 = 125 For 150-250: xᵢ = (150+250)/2 = 200 For 250-350: xᵢ = (250+350)/2 = 300 For 350-450: xᵢ = (350+450)/2 = 400
Now, let's calculate dᵢ = xᵢ - a = xᵢ - 200:
For 20-60: dᵢ = 40 - 200 = -160 For 60-100: dᵢ = 80 - 200 = -120 For 100-150: dᵢ = 125 - 200 = -75 For 150-250: dᵢ = 200 - 200 = 0 For 250-350: dᵢ = 300 - 200 = 100 For 350-450: dᵢ = 400 - 200 = 200
Now, let's calculate uᵢ = dᵢ/h = dᵢ/20:
For 20-60: uᵢ = -160/20 = -8 For 60-100: uᵢ = -120/20 = -6 For 100-150: uᵢ = -75/20 = -3.75 For 150-250: uᵢ = 0/20 = 0 For 250-350: uᵢ = 100/20 = 5 For 350-450: uᵢ = 200/20 = 10
Now, let's calculate fᵢuᵢ:
For 20-60: fᵢ = 7, uᵢ = -8, so fᵢuᵢ = -56 For 60-100: fᵢ = 5, uᵢ = -6, so fᵢuᵢ = -30 For 100-150: fᵢ = 16, uᵢ = -3.75, so fᵢuᵢ = -60 For 150-250: fᵢ = 12, uᵢ = 0, so fᵢuᵢ = 0 For 250-350: fᵢ = 2, uᵢ = 5, so fᵢuᵢ = 10 For 350-450: fᵢ = 3, uᵢ = 10, so fᵢuᵢ = 30
The sum of fᵢuᵢ is -56 + (-30) + (-60) + 0 + 10 + 30 = -106. The sum of frequencies is 7+5+16+12+2+3 = 45.
So, ū = Σfᵢuᵢ/Σfᵢ = -106/45 = -2.3556 approximately.
Now, using the formula x̄ = a + hū, we get: x̄ = 200 + 20 × (-106/45) = 200 - 47.11 = 152.89 approximately.
So, on average, these 45 bowlers took about 152.89 wickets in one-day cricket matches. This tells us that a typical bowler in this group took around 153 wickets. This is a good example of how we can use the mean to understand the performance of a group.
Now, students, before we move on to the next topic, let me quickly recap what we've learned about the mean of grouped data:
We learned three methods to calculate the mean: the direct method, the assumed mean method, and the step-deviation method. All three methods give the same result. The direct method is simple but can be tedious with large numbers. The assumed mean method simplifies calculations by subtracting a constant from all values. The step-deviation method further simplifies by dividing the deviations by a common factor (the class size). The choice of method depends on the data we're working with.
Now, let's move on to the Mode of Grouped Data.
You all remember what the mode is from your Class IX studies, right? The mode is that value among the observations which occurs most often, that is, the value of the observation having the maximum frequency. For ungrouped data, we can simply look at the frequency table and find the value with the highest frequency.
Let me give you a quick example. The wickets taken by a bowler in 10 cricket matches are: 2, 6, 4, 5, 0, 2, 1, 3, 2, 3. Let's make a frequency table:
Number of wickets: 0, frequency: 1 (it appears once) Number of wickets: 1, frequency: 1 Number of wickets: 2, frequency: 3 (it appears three times - the most!) Number of wickets: 3, frequency: 2 Number of wickets: 4, frequency: 1 Number of wickets: 5, frequency: 1 Number of wickets: 6, frequency: 1
So, the mode is 2, because it occurs the most number of times (3 times).
Now, for grouped data, it's not so straightforward. We can only locate a class with the maximum frequency, called the modal class. The mode is a value inside this modal class, and we need a formula to find it. The formula is:
Mode = l + ((f₁ - f₀)/(2f₁ - f₀ - f₂)) × h
where l is the lower limit of the modal class, h is the size of the class interval (assuming all class sizes are equal), f₁ is the frequency of the modal class, f₀ is the frequency of the class preceding the modal class, and f₂ is the frequency of the class succeeding the modal class.
Let me explain this formula with an example. Consider this frequency table showing the number of family members in 20 households:
Class intervals: 1-3, 3-5, 5-7, 7-9, 9-11 Number of families: 7, 8, 2, 2, 1
Here, the maximum frequency is 8, which is in the class 3-5. So, the modal class is 3-5.
Now, let's identify all the values we need for the formula: l = lower limit of modal class = 3 h = class size = 2 (since 5-3 = 2) f₁ = frequency of modal class = 8 f₀ = frequency of class preceding the modal class = 7 (the class 1-3) f₂ = frequency of class succeeding the modal class = 2 (the class 5-7)
Now, let's substitute these values in the formula:
Mode = 3 + ((8 - 7)/(2×8 - 7 - 2)) × 2 = 3 + (1/(16 - 9)) × 2 = 3 + (1/7) × 2 = 3 + 2/7 = 3 + 0.2857 = 3.2857, which we can round to 3.286
So, the mode is approximately 3.286. This means that the most common family size in this locality is about 3 or 4 members.
Now, let's look at another example. In Example 1, we had the marks of 30 students. The grouped frequency distribution table showed that 7 students got marks in the interval 40-55, which is the maximum. So, the modal class is 40-55.
Let's identify the values: l = lower limit of modal class = 40 h = class size = 15 f₁ = frequency of modal class = 7 f₀ = frequency of class preceding the modal class = 3 (the class 25-40) f₂ = frequency of class succeeding the modal class = 6 (the class 55-70)
Now, let's calculate the mode:
Mode = 40 + ((7 - 3)/(2×7 - 3 - 6)) × 15 = 40 + (4/(14 - 9)) × 15 = 40 + (4/5) × 15 = 40 + 12 = 52
So, the mode is 52. This means that the most common marks obtained by the students is 52.
Now, from our earlier calculation, we know that the mean of this data is 62. So, we can compare the mode and the mean. The mode (52) is less than the mean (62). This tells us that while most students scored around 52 marks, on average, students scored higher (62 marks). This could be because some students scored very high marks, pulling the average up.
Now, students, I want you to note something important here. In some situations, the mode may be less than the mean, in others it may be equal to the mean, and in still others it may be greater than the mean. It all depends on the distribution of data. The choice of whether to use the mean or the mode depends on what information we need. If we want to know the most common or popular value, we use the mode. If we want to know the average value, we use the mean.
Now, let's move on to the Median of Grouped Data.
You remember that the median is the middle-most observation in the data. For ungrouped data, we first arrange the data in ascending order. Then, if n is odd, the median is the (n+1)/2 th observation. If n is even, the median is the average of the n/2 th and (n/2 + 1) th observations.
But for grouped data, we need a different approach. Let me explain this with an example.
Suppose we have the marks obtained by 100 students in a test out of 50. The frequency distribution is given. Let me show you the table:
Marks obtained: 20, 25, 28, 29, 33, 38, 42, 43 Number of students: 6, 20, 24, 28, 15, 4, 2, 1
First, let's arrange this data and prepare a cumulative frequency table. Cumulative frequency is the running total of frequencies. Let me show you:
For marks 20: cumulative frequency = 6 For marks up to 25: cumulative frequency = 6 + 20 = 26 For marks up to 28: cumulative frequency = 26 + 24 = 50 For marks up to 29: cumulative frequency = 50 + 28 = 78 For marks up to 33: cumulative frequency = 78 + 15 = 93 For marks up to 38: cumulative frequency = 93 + 4 = 97 For marks up to 42: cumulative frequency = 97 + 2 = 99 For marks up to 43: cumulative frequency = 99 + 1 = 100
Now, since n = 100 (which is even), the median will be the average of the 50th and 51st observations. Looking at the cumulative frequency table, we can see that the 50th observation is 28 (because the cumulative frequency reaches 50 at marks 28), and the 51st observation is 29 (because the cumulative frequency reaches 78 at marks 29, so the 51st student got 29 marks). So, the median = (28 + 29)/2 = 28.5.
This means that 50% of the students scored less than 28.5 marks, and 50% scored more than 28.5 marks. The median gives us the middle value of the data.
Now, let's see how to find the median for grouped data. This is a bit more complex because in grouped data, we don't have individual observations - we only have class intervals.
Consider this grouped frequency distribution of marks obtained out of 100 by 53 students:
Class intervals: 0-10, 10-20, 20-30, 30-40, 40-50, 50-60, 60-70, 70-80, 80-90, 90-100 Number of students: 5, 3, 4, 3, 3, 4, 7, 9, 7, 8
First, let's create a cumulative frequency table. This table shows the number of students who scored less than a certain value. Let me calculate:
Less than 10: 5 Less than 20: 5 + 3 = 8 Less than 30: 8 + 4 = 12 Less than 40: 12 + 3 = 15 Less than 50: 15 + 3 = 18 Less than 60: 18 + 4 = 22 Less than 70: 22 + 7 = 29 Less than 80: 29 + 9 = 38 Less than 90: 38 + 7 = 45 Less than 100: 45 + 8 = 53
This is called a cumulative frequency distribution of the "less than" type. Here, 10, 20, 30, ..., 100 are the upper limits of the respective class intervals.
We can also create a "more than" type cumulative frequency distribution, which shows the number of students who scored more than or equal to a certain value.
More than or equal to 0: 53 More than or equal to 10: 53 - 5 = 48 More than or equal to 20: 48 - 3 = 45 More than or equal to 30: 45 - 4 = 41 More than or equal to 40: 41 - 3 = 38 More than or equal to 50: 38 - 3 = 35 More than or equal to 60: 35 - 4 = 31 More than or equal to 70: 31 - 7 = 24 More than or equal to 80: 24 - 9 = 15 More than or equal to 90: 15 - 7 = 8
Now, to find the median of grouped data, we need to find the median class. Here's how we do it:
First, we find n/2, where n is the total number of observations. In this case, n = 53, so n/2 = 26.5.
Then, we find the class whose cumulative frequency is greater than (and nearest to) n/2. This class is called the median class.
Looking at our cumulative frequency table, we can see that the class 60-70 has a cumulative frequency of 29, which is greater than (and nearest to) 26.5. So, the median class is 60-70.
Now, we use the following formula to calculate the median:
Median = l + ((n/2 - cf)/f) × h
where l is the lower limit of the median class, n is the number of observations, cf is the cumulative frequency of the class preceding the median class, f is the frequency of the median class, and h is the class size.
Let's substitute the values: l = 60 (lower limit of 60-70) n/2 = 26.5 cf = cumulative frequency of class preceding 60-70 = 22 (from the class 50-60) f = frequency of median class = 7 h = class size = 10
So, Median = 60 + ((26.5 - 22)/7) × 10 = 60 + (4.5/7) × 10 = 60 + 6.4286 = 66.4286, which is approximately 66.4
So, the median marks is 66.4. This means that about half the students scored less than 66.4 marks, and the other half scored more than 66.4 marks.
Now, let's look at another example to make sure you understand this concept well.
Example 7: A survey regarding the heights (in cm) of 51 girls of Class X was conducted, and the following data was obtained:
Height (in cm), Number of girls: Less than 140: 4 Less than 145: 11 Less than 150: 29 Less than 155: 40 Less than 160: 46 Less than 165: 51
We need to find the median height.
This is a "less than" type cumulative frequency distribution. The heights 140, 145, 150, ..., 165 are the upper limits of the class intervals. So, the classes are: below 140, 140-145, 145-150, 150-155, 155-160, 160-165.
Now, we need to find the actual frequencies from the cumulative frequencies:
For below 140: frequency = 4 For 140-145: frequency = 11 - 4 = 7 For 145-150: frequency = 29 - 11 = 18 For 150-155: frequency = 40 - 29 = 11 For 155-160: frequency = 46 - 40 = 6 For 160-165: frequency = 51 - 46 = 5
Now, let's create the frequency table with cumulative frequencies:
Class intervals: Below 140, 140-145, 145-150, 150-155, 155-160, 160-165 Frequencies: 4, 7, 18, 11, 6, 5 Cumulative frequencies: 4, 11, 29, 40, 46, 51
Now, n = 51, so n/2 = 25.5. The median class is the class whose cumulative frequency is greater than 25.5. Looking at the cumulative frequencies, 145-150 has a cumulative frequency of 29, which is greater than 25.5. So, the median class is 145-150.
Now, let's identify the values: l = lower limit of median class = 145 cf = cumulative frequency of class preceding 145-150 = 11 f = frequency of median class = 18 h = class size = 5
Now, let's calculate the median: Median = 145 + ((25.5 - 11)/18) × 5 = 145 + (14.5/18) × 5 = 145 + 0.8056 × 5 = 145 + 4.028 = 149.028, which is approximately 149.03 cm
So, the median height is 149.03 cm. This means that about 50% of the girls are shorter than 149.03 cm, and about 50% are taller.
Now, let's look at one more example where we need to find missing frequencies given the median.
Example 8: The median of the following data is 525. The total frequency is 100. We need to find the values of x and y.
Class intervals: 0-100, 100-200, 200-300, 300-400, 400-500, 500-600, 600-700, 700-800, 800-900, 900-1000 Frequencies: 2, 5, x, 12, 17, 20, y, 9, 7, 4
First, let's create a cumulative frequency table:
Class intervals: 0-100, cumulative frequency: 2 100-200: 2 + 5 = 7 200-300: 7 + x 300-400: 7 + x + 12 = 19 + x 400-500: 19 + x + 17 = 36 + x 500-600: 36 + x + 20 = 56 + x 600-700: 56 + x + y 700-800: 56 + x + y + 9 = 65 + x + y 800-900: 65 + x + y + 7 = 72 + x + y 900-1000: 72 + x + y + 4 = 76 + x + y
We know that the total frequency is 100, so: 76 + x + y = 100 x + y = 24 ...(1)
Now, the median is 525, which lies in the class 500-600. So, this is our median class.
Now, let's identify the values: l = lower limit of median class = 500 f = frequency of median class = 20 cf = cumulative frequency of class preceding 500-600 = 36 + x h = class size = 100
Now, using the median formula: Median = l + ((n/2 - cf)/f) × h 525 = 500 + ((50 - (36 + x))/20) × 100
Let's solve this: 525 - 500 = ((50 - 36 - x)/20) × 100 25 = (14 - x) × 5 25/5 = 14 - x 5 = 14 - x x = 14 - 5 = 9
So, x = 9.
Now, from equation (1), we have: 9 + y = 24 y = 24 - 9 = 15
So, y = 15.
We can verify this by checking the cumulative frequencies: Total = 2 + 5 + 9 + 12 + 17 + 20 + 15 + 9 + 7 + 4 = 100. Correct!
Now, students, let me summarize what we've learned about the three measures of central tendency:
The mean is the most commonly used measure of central tendency. It takes into account all the observations and lies between the extremes. It is useful for comparing different distributions. However, it can be affected by extreme values.
The median is useful when we want to find the middle value and when there are extreme values that might distort the mean. It is also useful when we want to know the typical value in situations like finding the average wage in a country, where extreme wages might be present.
The mode is useful when we want to find the most common or popular value. For example, the most popular color, the most common shoe size, or the most watched TV program.
There is an empirical relationship between these three measures: 3 Median = Mode + 2 Mean. This relationship holds approximately for moderately skewed distributions.
Now, students, we have covered all the main concepts in this chapter. Let me give you a quick summary of everything we've learned:
In this chapter on Statistics, we learned:
1. Mean of Grouped Data: We learned three methods to calculate the mean of grouped data: - Direct Method: x̄ = Σfᵢxᵢ / Σfᵢ - Assumed Mean Method: x̄ = a + Σfᵢdᵢ / Σfᵢ, where dᵢ = xᵢ - a - Step-Deviation Method: x̄ = a + (Σfᵢuᵢ/Σfᵢ) × h, where uᵢ = (xᵢ - a)/h
2. Mode of Grouped Data: We learned that for grouped data, the mode is found using the formula: Mode = l + [(f₁ - f₀)/(2f₁ - f₀ - f₂)] × h where l is the lower limit of the modal class, h is the class size, f₁ is the frequency of the modal class, f₀ is the frequency of the class preceding the modal class, and f₂ is the frequency of the class succeeding the modal class.
3. Median of Grouped Data: We learned about cumulative frequency and how to find the median using the formula: Median = l + [(n/2 - cf)/f] × h where l is the lower limit of the median class, n is the total number of observations, cf is the cumulative frequency of the class preceding the median class, f is the frequency of the median class, and h is the class size.
4. We also learned that the choice of which measure to use depends on the situation: - Use mean when you want the average of all values - Use median when you want the middle value and want to avoid the effect of extreme values - Use mode when you want the most frequently occurring value
This concludes our lesson on Statistics. I hope you now have a clear understanding of mean, median, and mode for grouped data, and you can apply these concepts to solve problems. Remember, statistics is not just about numbers - it's about understanding data and making sense of the world around us. Thank you for your attention, and keep practicing!