Hello, and welcome to today's mathematics lesson. We are going to explore Chapter 18: Statistics. By the end of this session, you will understand what statistics really means, how we classify different types of data, how to organize information into frequency distributions, and how to represent data visually through histograms and frequency polygons.
Let us begin with the word itself — statistics. This term emerged in the middle of the eighteenth century, tracing its roots to Latin, Italian, and German words all meaning "political state." Originally, statistics served government purposes alone, but today its scope stretches across every field imaginable.
Now, statistics carries two distinct meanings. In the singular sense, it refers to the entire branch of knowledge — the principles and methods we use to collect, analyze, and interpret data. In the plural sense, statistics means the actual numerical facts themselves — systematic collections of numbers gathered with a specific purpose. For instance, the marks of students in an examination, or the number of unemployed persons across different states — these are statistics in the plural sense.
At its heart, mathematical statistics follows a clear path. First, we collect numerical facts with a definite objective. Then we organize and analyze these facts to illuminate whatever question we are investigating. A set of such collected numerical facts is simply called data. Imagine six students scoring 2, 8, 9, 7, 6, and 5 marks out of ten — this collection forms a set of data.
Now let us turn to variables. A variable is any quantity that can change from one individual to another. Height, weight, age — these are all variables because they differ from person to person.
Variables fall into two fundamental categories. First, the continuous variable. This type can assume any numerical value within a particular range. Wages of workers, heights of children — these can take countless values between any two points.
Second, the discrete variable, also called discontinuous. This type cannot take every possible value within a range. Consider the number of children in a family. A family might have one child or two, but never one and a half or two and three-quarters. The values jump from one whole number to the next without intermediate possibilities.
When we first gather information, it arrives as raw or ungrouped data — a jumble of numbers in no particular order. Suppose twenty students score 8, 15, 23, 16, 46, and so on in a fifty-mark test. This scattered collection is raw data.
To make sense of it, we arrange the numbers in ascending or descending order. This organized arrangement is called arrayed data. Ascending order runs from smallest to largest: 8, 11, 15, 16, and upward. Descending order reverses this: 50, 49, 46, 45, and downward. Either arrangement helps us see patterns that were invisible in the raw form.
Tabulation brings further clarity. Imagine fifteen children distributed across three families. Family A has five children — three boys and two girls. Family B has four children — one boy and three girls. Family C has six children — four boys and two girls. When we arrange these facts in a structured table showing families, boys, girls, and totals, we create a tabulation. This format reveals relationships instantly: eight boys and seven girls among fifteen children total.
Frequency is our next crucial concept. Frequency tells us how often a particular value appears in our data set. Consider the numbers: 1, 2, 0, 3, 2, 1, 5, 4, 3, 2, 1, 2. Here, the number 1 appears three times, so its frequency is three. The number 2 appears four times, giving it frequency four. Each value carries its own frequency count.
A frequency distribution is simply a table that pairs each data value with its corresponding frequency. We construct such tables using tally marks — short vertical strokes grouped in fives for easy counting. Four strokes stand alone; the fifth stroke crosses diagonally through them, forming a bundle. This ancient counting method remains remarkably effective.
When data values are few and spread out, we create an ungrouped frequency distribution. Each distinct value gets its own row with its tally and frequency. But when data spans a wide range, grouping becomes essential.
Grouped frequency distributions organize data into class intervals — ranges like 1 to 10, 11 to 20, and so forth. All values falling within a range are counted together.
This compression reveals the overall shape of our data without overwhelming detail.
Two types of grouped distributions exist: inclusive and exclusive.
In an inclusive distribution, the upper limit of one class does not match the lower limit of the next. When we write 1 to 10, then 11 to 20, a value like 10.5 would belong to the first class, while 11 belongs to the second. The classes do not touch; small gaps exist between them.
In an exclusive distribution, the upper limit of one class coincides exactly with the lower limit of the next. Classes like 10 to 20, 20 to 30, 30 to 40 follow this pattern. Here, a value of 20 belongs to the second class, not the first. We express this as 10 less than or equal to x, which is less than 20. The upper boundary is excluded from each class.
Exclusive distributions ensure continuity and prevent ambiguity, so we generally prefer them. When faced with inclusive data, we convert it using an adjustment factor.
Here is how we make this adjustment. Find the gap between the upper limit of one class and the lower limit of the next. For classes 1 to 10 and 11 to 20, this gap is 1. Divide by 2 to get 0.5 — this is our adjustment factor.
Subtract this factor from every lower limit and add it to every upper limit. Our classes become 0.5 to 10.5, 10.5 to 20.5, 20.5 to 30.5, and so on. These adjusted boundaries are called actual or true class limits, also known as class boundaries. Now the classes touch perfectly without gaps or overlaps.
Several important terms describe our class intervals. The class size is the width of each interval — the difference between actual upper and actual lower limits. In our adjusted example, 10.5 minus 0.5 equals 10, so the class size is 10.
The class mark represents the center of each interval. We calculate it as the average of the lower and upper limits. For the interval 0.5 to 10.5, the class mark equals (0.5 + 10.5) / 2 = 5.5. This midpoint becomes crucial for many calculations.
Cumulative frequency adds another dimension to our analysis. It is the running total of frequencies up to and including each class.
Suppose we have four classes with frequencies 4, 9, 5, and 6. The cumulative frequency of the first class is simply 4. For the second class, we add 4 plus 9 to get 13. For the third, 4 plus 9 plus 5 equals 18. For the fourth, the total reaches 24.
We can also express this as a "less than" cumulative frequency table. Less than 10 contains 0 observations. Less than 20 contains 4. Less than 30 contains 13, and so forth. This format proves especially useful for certain graphical representations and calculations.
Numbers alone can overwhelm the mind, but pictures speak clearly. Graphical representation transforms abstract data into visual form, making patterns immediately apparent and memorable. Two powerful tools dominate our study: the histogram and the frequency polygon.
A histogram displays continuous frequency distributions through rectangles. The base of each rectangle represents a class interval. The height corresponds to the frequency of that class. Importantly, in a histogram, both width and height carry meaning — unlike bar charts where only height matters.
To construct a histogram, first ensure your data uses exclusive class intervals. If your data is inclusive, apply the adjustment we discussed earlier. Choose suitable scales for your axes — they need not match. Mark class intervals along the horizontal axis and frequencies vertically. Then draw rectangles for each class, touching one another to emphasize continuity.
Sometimes our data begins far from zero. If class intervals start at 40, showing empty space from 0 to 40 wastes space and distorts perception. Instead, we use a kink or zig-zag break near the origin to indicate the scale begins at 40, not at zero. This convention maintains honesty while improving clarity.
The frequency polygon offers an alternative visualization. Instead of rectangles, we plot points at the midpoint of each class interval at the appropriate frequency height, then connect these points with straight lines.
We can draw frequency polygons two ways. First, using a histogram: mark the midpoint of each rectangle's top edge, then connect these midpoints. To complete the polygon, extend lines to imagined midpoints before the first class and after the last class, both at zero frequency. This brings the figure down to the horizontal axis at both ends.
Second, without a histogram: calculate each class mark directly using (upper limit + lower limit) / 2. Plot these marks on the horizontal axis against their frequencies vertically. Connect the points, and again extend to zero-frequency points beyond the first and last classes. Both methods yield identical polygons.
Let us recap the essential ideas from today's lesson.
First, statistics encompasses both the methods of analyzing data and the numerical data itself. We collect, organize, present, and analyze information to draw meaningful conclusions.
Second, variables are quantities that vary, classified as continuous — taking any value in a range — or discrete — jumping between specific values.
Third, we organize raw data through arraying, tabulation, and frequency distributions, using inclusive or exclusive class intervals with appropriate adjustments for continuity.
Fourth, key measures include class size, the width of each interval; class mark, the midpoint value; and cumulative frequency, the running total of observations.
Fifth, histograms use rectangles to show frequency distributions, where both width and height matter.
Sixth, frequency polygons connect midpoints of classes at their frequency heights, creating a line graph that reveals the data's shape and trends.
Statistics opens a window into understanding the world through numbers. With these foundations firmly in place, you are ready to explore deeper into measures of central tendency and further statistical analysis. Keep practicing, stay curious, and remember — every data set tells a story waiting to be discovered. Until next time, happy learning!