Welcome dear students! Today we are going to learn about Statistics from Class 9 Maths.
The representation of data by tables has already been discussed. Now let us turn our attention to another representation of data, that is, the graphical representation. It is well said that one picture is better than a thousand words. Usually comparisons among the individual items are best shown by means of graphs. The representation then becomes easier to understand than the actual data. We shall study the following graphical representations in this section: First, bar graphs. Second, histograms of uniform width, and of varying widths. Third, frequency polygons.
[CHECKPOINT]
Let us begin with bar graphs. In earlier classes, you have already studied and constructed bar graphs. Here we shall discuss them through a more formal approach. Recall that a bar graph is a pictorial representation of data in which usually bars of uniform width are drawn with equal spacing between them on one axis, say the x-axis, depicting the variable. The values of the variable are shown on the other axis, say the y-axis, and the heights of the bars depend on the values of the variable.
Let us look at Example 1. In a particular section of Class 9, 40 students were asked about the months of their birth and a graph was prepared for the data. In this figure, the horizontal axis lists the months from January to December, and the vertical axis shows the number of students from 0 to 8. The bars have uniform width and equal spacing. The height of each bar corresponds to the number of students born in that month. Observe the bar graph and answer the following questions. First, how many students were born in the month of November? Second, in which month were the maximum number of students born?
[CHECKPOINT]
Solution. Note that the variable here is the month of birth, and the value of the variable is the number of students born. For the first question, 4 students were born in the month of November. For the second question, the maximum number of students were born in the month of August.
Let us now recall how a bar graph is constructed by considering Example 2. A family with a monthly income of 20,000 rupees had planned the following expenditures per month under various heads: Grocery is 4 thousand rupees. Rent is 5 thousand rupees. Education of children is 5 thousand rupees. Medicine is 2 thousand rupees. Fuel is 2 thousand rupees. Entertainment is 1 thousand rupees. Miscellaneous is 1 thousand rupees. Draw a bar graph for the data above.
[CHECKPOINT]
Solution. We draw the bar graph of this data in the following steps. Note that the unit in the second column is thousand rupees. So, 4 against grocery means 4,000 rupees. Step 1. We represent the heads, which is the variable, on the horizontal axis choosing any scale, since the width of the bar is not important. But for clarity, we take equal widths for all bars and maintain equal gaps in between. Let one Head be represented by one unit. Step 2. We represent the expenditure, which is the value, on the vertical axis. Since the maximum expenditure is 5,000 rupees, we can choose the scale as 1 unit equals 1,000 rupees. Step 3. To represent our first head, that is, grocery, we draw a rectangular bar with width 1 unit and height 4 units. Step 4. Similarly, other heads are represented leaving a gap of 1 unit in between two consecutive bars. The bar graph is drawn in Figure 12.2. In this figure, we see seven bars corresponding to each head, with heights proportional to their expenditure values. Here, you can easily visualise the relative characteristics of the data at a glance, for example, the expenditure on education is more than double that of medical expenses. Therefore, in some ways it serves as a better representation of data than the tabular form.
[CHECKPOINT]
Activity 1. Continuing with the same four groups of Activity 1, represent the data by suitable bar graphs. Let us now see how a frequency distribution table for continuous class intervals can be represented graphically.
This brings us to histograms. This is a form of representation like the bar graph, but it is used for continuous class intervals. For instance, consider the frequency distribution Table 12.2, representing the weights of 36 students of a class. The table shows weights in kilograms and the number of students. The intervals are 30.5 to 35.5 with 9 students, 35.5 to 40.5 with 6 students, 40.5 to 45.5 with 15 students, 45.5 to 50.5 with 3 students, 50.5 to 55.5 with 1 student, and 55.5 to 60.5 with 2 students. The total is 36. Let us represent the data given above graphically as follows. Step 1. We represent the weights on the horizontal axis on a suitable scale. We can choose the scale as 1 centimeter equals 5 kilograms. Also, since the first class interval is starting from 30.5 and not zero, we show it on the graph by marking a kink or a break on the axis. Step 2. We represent the number of students, which is the frequency, on the vertical axis on a suitable scale. Since the maximum frequency is 15, we need to choose the scale to accommodate this maximum frequency. Step 3. We now draw rectangles, or rectangular bars, of width equal to the class-size and lengths according to the frequencies of the corresponding class intervals. For example, the rectangle for the class interval 30.5 to 35.5 will be of width 1 centimeter and length 4.5 centimeters. Step 4. In this way, we obtain the graph as shown in Figure 12.3. In this figure, we see adjacent rectangles with no gaps between them, starting from 30.5 on the horizontal axis.
[CHECKPOINT]
Observe that since there are no gaps in between consecutive rectangles, the resultant graph appears like a solid figure. This is called a histogram, which is a graphical representation of a grouped frequency distribution with continuous classes. Also, unlike a bar graph, the width of the bar plays a significant role in its construction. Here, in fact, areas of the rectangles erected are proportional to the corresponding frequencies. However, since the widths of the rectangles are all equal, the lengths of the rectangles are proportional to the frequencies. That is why, we draw the lengths according to step 3 above.
Now, consider a situation different from the one above. Example 3. A teacher wanted to analyse the performance of two sections of students in a mathematics test of 100 marks. Looking at their performances, she found that a few students got under 20 marks and a few got 70 marks or above. So she decided to group them into intervals of varying sizes as follows: 0 to 20, 20 to 30, up to 60 to 70, and 70 to 100. Then she formed the following table. Table 12.3 shows marks and number of students. 0 to 20 has 7 students. 20 to 30 has 10 students. 30 to 40 has 10 students. 40 to 50 has 20 students. 50 to 60 has 20 students. 60 to 70 has 15 students. 70 and above has 8 students. Total is 90.
[CHECKPOINT]
A histogram for this table was prepared by a student as shown in Figure 12.4. In this figure, we see rectangles of varying widths corresponding to the class intervals. Carefully examine this graphical representation. Do you think that it correctly represents the data? No, the graph is giving us a misleading picture. As we have mentioned earlier, the areas of the rectangles are proportional to the frequencies in a histogram. Earlier this problem did not arise, because the widths of all the rectangles were equal. But here, since the widths of the rectangles are varying, the histogram above does not give a correct picture. For example, it shows a greater frequency in the interval 70 to 100, than in 60 to 70, which is not the case. So, we need to make certain modifications in the lengths of the rectangles so that the areas are again proportional to the frequencies. The steps to be followed are as given below. Step 1. Select a class interval with the minimum class size. In the example above, the minimum class-size is 10. Step 2. The lengths of the rectangles are then modified to be proportionate to the class-size 10. For instance, when the class-size is 20, the length of the rectangle is 7. So when the class-size is 10, the length of the rectangle will be 7/20 × 10 = 3.5. Similarly, proceeding in this manner, we get the following table.
[CHECKPOINT]
Table 12.4 shows Marks, Frequency, Width of the class, and Length of the rectangle. For 0 to 20, frequency is 7, width is 20, length is 7/20 × 10 = 3.5. For 20 to 30, frequency is 10, width is 10, length is 10/10 × 10 = 10. For 30 to 40, frequency is 10, width is 10, length is 10/10 × 10 = 10. For 40 to 50, frequency is 20, width is 10, length is 20/10 × 10 = 20. For 50 to 60, frequency is 20, width is 10, length is 20/10 × 10 = 20. For 60 to 70, frequency is 15, width is 10, length is 15/10 × 10 = 15. For 70 and above, frequency is 8, width is 30, length is 8/30 × 10 = 2.67.
Since we have calculated these lengths for an interval of 10 marks in each case, we may call these lengths as “proportion of students per 10 marks interval”. So, the correct histogram with varying width is given in Figure 12.5. In this figure, we see rectangles with adjusted heights so that their areas correctly reflect the frequencies despite the varying widths.
[CHECKPOINT]
Now let us move to frequency polygons. There is yet another visual way of representing quantitative data and its frequencies. This is a polygon. To see what we mean, consider the histogram represented by Figure 12.3. Let us join the mid-points of the upper sides of the adjacent rectangles of this histogram by means of line segments. Let us call these mid-points B, C, D, E, F and G. When joined by line segments, we obtain the figure BCDEFG. To complete the polygon, we assume that there is a class interval with frequency zero before 30.5 to 35.5, and one after 55.5 to 60.5, and their mid-points are A and H, respectively. ABCDEFGH is the frequency polygon corresponding to the data shown in Figure 12.3. We have shown this in Figure 12.6. In this figure, we see a closed polygon formed by connecting the midpoints of the histogram bars, extended to zero frequency on both ends.
Although there exists no class preceding the lowest class and no class succeeding the highest class, addition of the two class intervals with zero frequency enables us to make the area of the frequency polygon the same as the area of the histogram. Why is this so? Use the properties of congruent triangles to think about it. Now, the question arises: how do we complete the polygon when there is no class preceding the first class? Let us consider such a situation.
[CHECKPOINT]
Example 4. Consider the marks, out of 100, obtained by 51 students of a class in a test, given in Table 12.5. The table shows marks and number of students. 0 to 10 has 5 students. 10 to 20 has 10 students. 20 to 30 has 4 students. 30 to 40 has 6 students. 40 to 50 has 7 students. 50 to 60 has 3 students. 60 to 70 has 2 students. 70 to 80 has 2 students. 80 to 90 has 3 students. 90 to 100 has 9 students. Total is 51. Draw a frequency polygon corresponding to this frequency distribution table.
Solution. Let us first draw a histogram for this data and mark the mid-points of the tops of the rectangles as B, C, D, E, F, G, H, I, J, K, respectively. Here, the first class is 0 to 10. So, to find the class preceding 0 to 10, we extend the horizontal axis in the negative direction and find the mid-point of the imaginary class-interval negative 10 to 0. The first end point, that is, B is joined to this mid-point with zero frequency on the negative direction of the horizontal axis. The point where this line segment meets the vertical axis is marked as A. Let L be the mid-point of the class succeeding the last class of the given data. Then OABCDEFGHIJKL is the frequency polygon, which is shown in Figure 12.7. In this figure, we see a frequency polygon starting from the origin, rising and falling across the class intervals, and returning to the horizontal axis at the end.
[CHECKPOINT]
Frequency polygons can also be drawn independently without drawing histograms. For this, we require the mid-points of the class-intervals used in the data. These mid-points of the class-intervals are called class-marks. To find the class-mark of a class interval, we find the sum of the upper limit and lower limit of a class and divide it by 2. Thus, Class-mark = (Upper limit + Lower limit) / 2. Let us consider an example.
Example 5. In a city, the weekly observations made in a study on the cost of living index are given in the following table. Table 12.6 shows cost of living index and number of weeks. 140 to 150 has 5 weeks. 150 to 160 has 10 weeks. 160 to 170 has 20 weeks. 170 to 180 has 9 weeks. 180 to 190 has 6 weeks. 190 to 200 has 2 weeks. Total is 52. Draw a frequency polygon for the data above without constructing a histogram.
Solution. Since we want to draw a frequency polygon without a histogram, let us find the class-marks of the classes given above, that is of 140 to 150, 150 to 160, and so on. For 140 to 150, the upper limit equals 150, and the lower limit equals 140. So, the class-mark equals 150 + 140 / 2 = 290 / 2 = 145. Continuing in the same manner, we find the class-marks of the other classes as well. So, the new table obtained is as shown in Table 12.7. It shows classes, class-marks, and frequency. 140 to 150 has class-mark 145 and frequency 5. 150 to 160 has 155 and 10. 160 to 170 has 165 and 20. 170 to 180 has 175 and 9. 180 to 190 has 185 and 6. 190 to 200 has 195 and 2. Total is 52.
[CHECKPOINT]
We can now draw a frequency polygon by plotting the class-marks along the horizontal axis, the frequencies along the vertical-axis, and then plotting and joining the points B(145, 5), C(155, 10), D(165, 20), E(175, 9), F(185, 6) and G(195, 2) by line segments. We should not forget to plot the point corresponding to the class-mark of the class 130 to 140, just before the lowest class 140 to 150, with zero frequency, that is, A(135, 0), and the point H(205, 0) occurs immediately after G(195, 2). So, the resultant frequency polygon will be ABCDEFGH, as shown in Figure 12.8. In this figure, we see a polygon plotted directly using class-marks on the x-axis and frequencies on the y-axis, closing at both ends with zero frequency points.
Frequency polygons are used when the data is continuous and very large. It is very useful for comparing two different sets of data of the same nature, for example, comparing the performance of two different sections of the same class.
Now let us work through Exercise 12.1 completely. Question 1. A survey conducted by an organisation for the cause of illness and death among the women between the ages 15 to 44 in years worldwide, found the following figures in percentage. Reproductive health conditions is 31.8 percent. Neuropsychiatric conditions is 25.4 percent. Injuries is 12.4 percent. Cardiovascular conditions is 4.3 percent. Respiratory conditions is 4.1 percent. Other causes is 22.0 percent. Part i asks to represent the information given above graphically. We will draw a bar graph. The horizontal axis will list the six causes. The vertical axis will show the female fatality rate in percentage, scaled from 0 to 35. We draw bars of uniform width with equal spacing. The heights will be 31.8, 25.4, 12.4, 4.3, 4.1, and 22.0 respectively. Part ii asks which condition is the major cause of women's ill health and death worldwide. Looking at the percentages, reproductive health conditions has the highest value at 31.8 percent. Therefore, reproductive health conditions is the major cause. Part iii asks to find out, with the help of your teacher, any two factors which play a major role in the cause in part ii above being the major cause. Two major factors could be lack of access to proper medical facilities and lack of awareness about reproductive health care.
[CHECKPOINT]
Question 2. The following data on the number of girls to the nearest ten per thousand boys in different sections of Indian society is given below. Scheduled Caste is 940. Scheduled Tribe is 970. Non Scheduled Caste or Scheduled Tribe is 920. Backward districts is 950. Non-backward districts is 920. Rural is 930. Urban is 910. Part i asks to represent the information above by a bar graph. On the horizontal axis, we list the seven sections. On the vertical axis, we scale from 900 to 980. We draw bars of equal width and spacing with heights corresponding to each value: 940, 970, 920, 950, 920, 930, and 910. Part ii asks to discuss what conclusions can be arrived at from the graph. From the graph, we can conclude that the number of girls per thousand boys is highest in Scheduled Tribes at 970 and lowest in Urban areas at 910. Backward districts have a higher ratio than non-backward districts, and rural areas have a higher ratio than urban areas.
Question 3. Given below are the seats won by different political parties in the polling outcome of a state assembly elections. Party A won 75. Party B won 55. Party C won 37. Party D won 29. Party E won 10. Party F won 37. Part i asks to draw a bar graph to represent the polling results. We place the political parties on the horizontal axis and seats won on the vertical axis, scaled from 0 to 80. We draw bars with heights 75, 55, 37, 29, 10, and 37 respectively. Part ii asks which political party won the maximum number of seats. Party A won 75 seats, which is the highest. Therefore, Party A won the maximum number of seats.
[CHECKPOINT]
Question 4. The length of 40 leaves of a plant are measured correct to one millimetre, and the obtained data is represented in the following table. Length in millimetres and number of leaves. 118 to 126 has 3 leaves. 127 to 135 has 5 leaves. 136 to 144 has 9 leaves. 145 to 153 has 12 leaves. 154 to 162 has 5 leaves. 163 to 171 has 4 leaves. 172 to 180 has 2 leaves. Part i asks to draw a histogram to represent the given data. The hint says first make the class intervals continuous. The given classes are discontinuous with a gap of 1 between them. The difference between the lower limit of a class and the upper limit of the preceding class is 1. So, we subtract 0.5 from each lower limit and add 0.5 to each upper limit. The new continuous intervals become 117.5 to 126.5, 126.5 to 135.5, 135.5 to 144.5, 144.5 to 153.5, 153.5 to 162.5, 162.5 to 171.5, and 171.5 to 180.5. The frequencies remain 3, 5, 9, 12, 5, 4, 2. We draw a histogram with these continuous intervals on the x-axis and frequencies on the y-axis. Part ii asks if there is any other suitable graphical representation for the same data. Yes, a frequency polygon can also be drawn for this data by joining the mid-points of the tops of the histogram rectangles. Part iii asks if it is correct to conclude that the maximum number of leaves are 153 mm long. Why? No, it is not correct. The class interval 144.5 to 153.5 has the highest frequency of 12, but this means 12 leaves have lengths between 144.5 and 153.5 mm, not exactly 153 mm.
[CHECKPOINT]
Question 5. The following table gives the life times of 400 neon lamps. Life time in hours and number of lamps. 300 to 400 has 14. 400 to 500 has 56. 500 to 600 has 60. 600 to 700 has 86. 700 to 800 has 74. 800 to 900 has 62. 900 to 1000 has 48. Part i asks to represent the given information with the help of a histogram. The classes are already continuous. We plot life time on the x-axis and number of lamps on the y-axis. We draw rectangles for each interval with heights 14, 56, 60, 86, 74, 62, and 48 respectively. Part ii asks how many lamps have a life time of more than 700 hours. We add the frequencies for the intervals 700 to 800, 800 to 900, and 900 to 1000. That is 74 + 62 + 48 = 184. So, 184 lamps have a life time of more than 700 hours.
Question 6. The following table gives the distribution of students of two sections according to the marks obtained by them. Section A: 0 to 10 has 3, 10 to 20 has 9, 20 to 30 has 17, 30 to 40 has 12, 40 to 50 has 9. Section B: 0 to 10 has 5, 10 to 20 has 19, 20 to 30 has 15, 30 to 40 has 10, 40 to 50 has 1. The question asks to represent the marks of the students of both the sections on the same graph by two frequency polygons. From the two polygons compare the performance of the two sections. First, we find the class-marks for each interval. For 0 to 10, class-mark is 5. For 10 to 20, it is 15. For 20 to 30, it is 25. For 30 to 40, it is 35. For 40 to 50, it is 45. We plot points for Section A: (5, 3), (15, 9), (25, 17), (35, 12), (45, 9). We plot points for Section B: (5, 5), (15, 19), (25, 15), (35, 10), (45, 1). We also add zero frequency points at class-marks negative 5 and 55 to close the polygons. Joining these points gives two frequency polygons. Comparing them, Section A has more students in the higher mark ranges, particularly 20 to 30 and 30 to 40, while Section B peaks at 10 to 20. Overall, Section A performed better as its polygon is shifted towards higher marks.
[CHECKPOINT]
Question 7. The runs scored by two teams A and B on the first 60 balls in a cricket match are given below. Number of balls and runs for Team A and Team B. 1 to 6: A scored 2, B scored 5. 7 to 12: A scored 1, B scored 6. 13 to 18: A scored 8, B scored 2. 19 to 24: A scored 9, B scored 10. 25 to 30: A scored 4, B scored 5. 31 to 36: A scored 5, B scored 6. 37 to 42: A scored 6, B scored 3. 43 to 48: A scored 10, B scored 4. 49 to 54: A scored 6, B scored 8. 55 to 60: A scored 2, B scored 10. The question asks to represent the data of both the teams on the same graph by frequency polygons. The hint says first make the class intervals continuous. The gap between classes is 1. We subtract 0.5 from lower limits and add 0.5 to upper limits. New intervals: 0.5 to 6.5, 6.5 to 12.5, 12.5 to 18.5, 18.5 to 24.5, 24.5 to 30.5, 30.5 to 36.5, 36.5 to 42.5, 42.5 to 48.5, 48.5 to 54.5, 54.5 to 60.5. Class-marks are 3.5, 9.5, 15.5, 21.5, 27.5, 33.5, 39.5, 45.5, 51.5, 57.5. We plot these on the x-axis and runs on the y-axis. For Team A, points are (3.5, 2), (9.5, 1), (15.5, 8), (21.5, 9), (27.5, 4), (33.5, 5), (39.5, 6), (45.5, 10), (51.5, 6), (57.5, 2). For Team B, points are (3.5, 5), (9.5, 6), (15.5, 2), (21.5, 10), (27.5, 5), (33.5, 6), (39.5, 3), (45.5, 4), (51.5, 8), (57.5, 10). We add zero points at negative 2.5 and 63.5. Joining them gives two polygons. Team A peaks around 21.5 balls, while Team B shows higher runs at the beginning and end of the 60 balls.
[CHECKPOINT]
Question 8. A random survey of the number of children of various age groups playing in a park was found as follows. Age in years and number of children. 1 to 2 has 5. 2 to 3 has 3. 3 to 5 has 6. 5 to 7 has 12. 7 to 10 has 9. 10 to 15 has 10. 15 to 17 has 4. The question asks to draw a histogram to represent the data above. The class intervals have varying widths. We must adjust the lengths so that areas are proportional to frequencies. The minimum class size is 1. We calculate proportion of children per 1 year interval. For 1 to 2, width is 1, frequency 5, adjusted length is 5/1 × 1 = 5. For 2 to 3, width 1, frequency 3, length 3. For 3 to 5, width 2, frequency 6, length 6/2 × 1 = 3. For 5 to 7, width 2, frequency 12, length 12/2 × 1 = 6. For 7 to 10, width 3, frequency 9, length 9/3 × 1 = 3. For 10 to 15, width 5, frequency 10, length 10/5 × 1 = 2. For 15 to 17, width 2, frequency 4, length 4/2 × 1 = 2. We draw the histogram with these adjusted lengths on the y-axis and age groups on the x-axis.
Question 9. 100 surnames were randomly picked up from a local telephone directory and a frequency distribution of the number of letters in the English alphabet in the surnames was found as follows. Number of letters and number of surnames. 1 to 4 has 6. 4 to 6 has 30. 6 to 8 has 44. 8 to 12 has 16. 12 to 20 has 4. Part i asks to draw a histogram to depict the given information. The class intervals have varying widths. Minimum width is 2. We adjust lengths for width 2. For 1 to 4, width 3, frequency 6, length is 6/3 × 2 = 4. For 4 to 6, width 2, frequency 30, length is 30/2 × 2 = 30. For 6 to 8, width 2, frequency 44, length is 44/2 × 2 = 44. For 8 to 12, width 4, frequency 16, length is 16/4 × 2 = 8. For 12 to 20, width 8, frequency 4, length is 4/8 × 2 = 1. We draw the histogram with these adjusted lengths. Part ii asks to write the class interval in which the maximum number of surnames lie. The highest frequency is 44, which corresponds to the class interval 6 to 8.
[CHECKPOINT]
We have now completed all exercise questions. Let us move to the chapter summary. In this chapter, you have studied the following points. First, how data can be presented graphically in the form of bar graphs, histograms and frequency polygons. Second, how to adjust class intervals if they are of varying sizes to obtain a modified frequency distribution for drawing histograms. Third, how to draw frequency polygons from histograms, or by using the midpoints of the class intervals. Fourth, the three measures of central tendency for ungrouped data are. Mean. It is found by adding all the values of the observations and dividing it by the total number of observations. It is denoted by x̄. For n observations x₁, x₂, x₃, ..., xₙ, mean x̄ = (x₁ + x₂ + ... + xₙ)/n. Median. It is the value of the middle-most observation or observations. If n is an odd number, the median equals value of the ((n+1)/2)th observation. If n is an even number, median equals mean of the values of the (n/2)th and ((n/2)+1)th observations. Mode. The mode is the most frequently occurring observation.
This concludes our detailed study of Chapter 12 on Statistics. Remember to practice drawing these graphs accurately and to always check whether class intervals are continuous or of uniform width before plotting. Understanding these graphical representations will greatly help you in analysing data effectively in your exams and beyond.
Thank you for listening! Keep revising and practicing. Goodbye! [CHAPTER_COMPLETE]