Hello, my dear students! Welcome to today's mathematics lesson. I am so happy to be here with you to learn about something really interesting and useful — Chapter 5 of your Ganita Prakash textbook, titled "Connecting the Dots."
Now, isn't that a funny title? Connecting the dots — what do you think that means? Well, in mathematics, especially in statistics, we collect information, which we call data, and then we try to make sense of it. We connect the dots to see patterns, to find answers, and to tell stories with numbers. So let's begin our journey into the world of data and statistics.
In this chapter, we are going to learn about statistical questions and statements, representative values like average and median, how to visualize data using dot plots and bar graphs, and how to become data detectives to uncover interesting stories hidden in numbers. Are you excited? I am sure you are!
Let's start with Section 5.1, which is called "Of Questions and Statements."
Now, students, think about this scenario. Your teacher tells you that they are meeting two of their childhood friends this evening. One friend is 5 feet tall and the other is 6 feet tall. What is your guess as to each friend's gender based on this information?
You might have guessed that the 5-foot-tall person is a woman and the 6-foot-tall person is a man. There is a chance that you are wrong — maybe the 5-foot-tall person is a man or the 6-foot-tall person is a woman. But experience tells us that 5-foot-tall men and 6-foot-tall women are rare. We have seen that, more often, men are taller than women.
This is a simple example of what we call statistical thinking. We use our past experience and observations to make reasonable guesses about new situations.
Now, students, let's think about some statements we come across in our daily lives. Your teacher might say, "Jemimah's batting has been very consistent over the past year. We can expect a century from her in tomorrow's match." That's a statistical statement — it's making a prediction based on past performance.
Or think about this — "I take about 15 minutes to cycle from school to home." That's another statistical statement. It's giving an approximate time based on regular experience.
Or maybe, "I think my pen might last for 2 more weeks; it is time to get a new one soon." That's a prediction based on how the pen has been behaving.
Or consider, "The population of their village has reduced by about 100 in the last decade." That's a statement about change over time.
And here's another one — "Since I started to eat fruits and vegetables more frequently, I am able to run 2 km more each day." That's showing a possible relationship between two things.
And finally, "David spends about 7 hours daily in the school." That's an estimate of time spent.
All these are what we call statistical statements. Simply put, a statistical statement is a claim or summary about some phenomenon, expressed in terms of numerical values, proportions, probabilities, or predictions.
Now, students, what about questions? What is a statistical question? A statistical question is a question that can be answered by collecting data. For example, "How tall are Grade 7 students in our school?" is a statistical question. We expect that not all Grade 7 students have the same height — there will be variation. But we can collect data, analyze it, and make conclusions about the heights that occur.
Another statistical question could be, "Typically, are onions costlier in Yahapur or Wahapur?" Prices can vary over time, so to answer this question, we need to look at data, analyze it, and come to conclusions by making suitable statistical statements.
Now, let me give you some examples, and I want you to tell me which ones are statistical questions. Are you ready?
First one — (a) "What is the price of a tennis ball in India?" Is this a statistical question? Well, students, think about it. If we ask about the price of a tennis ball, is there variation in the answer? Actually, no — the price of a tennis ball in India is likely to be fairly standard, at least at a given point in time. So this is not really a statistical question because we don't need to collect data from many sources to answer it. It's more of a factual question.
Next — (b) "How old are the dogs that live on this street?" Now, this is a statistical question because different dogs on the street will have different ages. We need to collect data to answer this question.
Third — (c) "What fraction of the students in your class like walking up a hill?" This is also a statistical question because different students will have different preferences, and we need to collect data from the class to find the fraction.
Fourth — (d) "Do you like reading?" Now, this is a simple yes or no question. It doesn't really require collecting and analyzing data in the way we've been discussing. So this is not a statistical question.
Fifth — (e) "Approximately how many bricks are in this wall?" This is interesting — we would need to count or estimate the bricks, and there might be some variation in our estimate. So yes, this could be considered a statistical question because we need to collect data or make observations to answer it.
Sixth — (f) "Who was the best bowler in the match yesterday?" This is asking for a specific fact about a particular match. It's not really a statistical question because we're not collecting data from many sources.
Seventh — (g) "What was the rainfall pattern in Barmer last year?" Now, this is a statistical question because rainfall varies from month to month, day to day, and we need to collect and analyze data to describe the pattern.
So the statistical questions from this list are (b), (c), (e), and (g).
Now, students, the term statistics refers to the study of collecting, organizing, analyzing, interpreting, and presenting data. In this chapter, we shall encounter some statistical questions and learn how analyzing data and graphs can help answer them.
Now let's move on to Section 5.2, which is called "Representative Values."
This is a very important section, so pay close attention.
Imagine, students, that we have the runs scored by two cricket players — Shubman and Yashasvi — in a cricket series. The data is given in a table. Let me tell you what the table shows.
In Match 1, Shubman scored 0 runs and Yashasvi scored 67 runs. In Match 2, Shubman scored 17 runs and Yashasvi scored 55 runs. In Match 3, Shubman scored 21 runs and Yashasvi scored 18 runs. In Match 4, Shubman scored 90 runs and Yashasvi scored 35 runs.
Now the question is — who do you think performed better?
Let's see what different students think. Shreyas says, "Both their performances are similar since Yashasvi scored more in the first and second matches, whereas Shubman scored more in the third and fourth."
Vaishnavi says, "I think Shubman performed better because he scored the highest number of runs in a match — 90!"
Shreyas responds, "No! Yashasvi batted better since the total number of runs he made is 175, while Shubman made only 128."
Vaishnavi then says, "Oh! Also, Yashasvi's batting is more consistent — the difference between his maximum score and minimum score is lower."
Now, students, you can see that different people have different ways of looking at the same data. Some look at the highest score, some look at the total, some look at consistency. So how do we decide who performed better? It's often not simple to compare two groups of numbers and clearly say that one is better than the other.
Now, let's look at another series. Here is another table showing their performances in another series.
For Shubman: Match 1 — 23 runs, Match 2 — 7 runs, Match 3 — 10 runs, Match 4 — 52 runs, Match 5 — 18 runs. For Yashasvi: Match 1 — 26 runs, Match 2 — 53 runs, Match 3 — 2 runs, Match 4 — (no data, perhaps he didn't play), Match 5 — 15 runs.
Vaishnavi says, "Here, Shubman performed better since his total is 110 runs, while Yashasvi's total is 96 runs."
But wait, students — what do you think of Vaishnavi's statement? Is it fair to compare 110 runs in 5 matches with 96 runs in 4 matches? Shreyas points out exactly this: "But Yashasvi made 96 runs in 4 matches and Shubman made 110 runs in 5 matches."
So, how do we say who performed better? This is where the concept of average comes in very handy.
Can a single number act as a representative of a group of numbers? For example, can we represent Shubman's or Yashasvi's batting in this series with one number?
We already saw one way — the total of the values in the group! But as we just discovered, if the group sizes are different, then the total may not be an appropriate measure to compare.
In some matches, a player could have scored more and in other matches less. A representative number for the group can be found by balancing out these highs and lows. For example, we can add up the runs scored in all the matches and divide the total by the number of matches played. We call this value the "average" or "arithmetic mean" of the given data.
Here, the average number of runs scored by a player in a match equals the total runs scored by the player in all the matches divided by the number of matches played.
So, for Shubman, the average number of runs scored in a match equals 110 divided by 5, which is 21 runs.
For Yashasvi, the average number of runs scored in a match equals 96 divided by 4, which is 24 runs.
So in this series, Yashasvi's average number of runs is higher than Shubman's. This gives us a fair way to compare their performances even though they played different numbers of matches.
Now, students, let me formally define what we mean by average or arithmetic mean.
The Average or Arithmetic Mean, or simply Mean, is calculated as follows:
Mean equals the sum of all the values in the data divided by the number of values in the data.
So if we have the numbers 10, 20, 30, 40, and 50, the mean would be (10 + 20 + 30 + 40 + 50) divided by 5, which equals 150 divided by 5, which is 30.
Now, let me explain the average in another way — as a fair-share or equal-share concept.
Suppose Shreyas and 4 of his friends have collected the following numbers of guavas: 3, 8, 10, 5, and 4. That's 5 people in total. Parag and 5 of his friends have collected the following numbers of guavas: 5, 4, 6, 3, 4, and 8. That's 6 people in total.
Each group will share their guavas equally amongst themselves. In which group will each member get a bigger share of guavas?
To find this out, we first find out how many guavas each group has collected. Then we divide this total by the number of people in the group to get each member's share.
Shreyas's group has collected 3 plus 8 plus 10 plus 5 plus 4 equals 30 guavas. Each member of Shreyas's group gets 30 divided by 5, which is 6 guavas.
Parag's group has collected 5 plus 4 plus 6 plus 3 plus 4 plus 8 equals 30 guavas. Each member of Parag's group gets 30 divided by 6, which is 5 guavas.
So, the members of Shreyas's group get 1 more guava each than the members of Parag's group. Even though both groups collected the same total number of guavas, because Shreyas's group had fewer people, each person got a bigger share.
This is the fair-share idea behind the average — if we were to distribute the total equally among all members, what would each one get?
Now, let's look at another example. Vaishnavi tracks the number of Hibiscus flowers blooming in her garden each day. The data for the last few days is 2, 7, 9, 4, and 3. What is the average number of Hibiscus flowers blooming per day in Vaishnavi's garden?
The average equals the total number of Hibiscus flowers bloomed divided by the number of days. So that's (2 + 7 + 9 + 4 + 3) divided by 5, which equals 25 divided by 5, which is 5.
So, on average, 5 Hibiscus flowers bloom daily. In this case, the average tells us the number of flowers blooming each day if an equal number of flowers bloomed daily. It's a way of representing the whole set of data with a single number.
Now, students, isn't it fascinating to know that ancient Indian mathematicians also studied this concept? One of the terms used for the Arithmetic Mean in ancient Indian mathematics is samamiti, which means "mean measure." The word "sama" means equal. Some terms used for the Arithmetic Mean in Indian texts include samarajju, which means mean measure of a line segment, used by Brahmagupta in 628 CE. Mahāvīrācārya in 850 CE used the term samīkaraṇa, which means levelling or equalising. Śrīpati in 1039 CE used the term sāmya, which means equality, impartiality, or equability. And Bhāskarācārya in 1150 CE and Gaṇeṣa in 1545 CE used the term samamiti. The terminology shows that ancient Indian scholars perceived the Arithmetic Mean as the "common" or "equalising" value that is a representative measure of a collection of values.
Now, let's look at another very important example involving onion prices in two towns called Yahapur and Wahapur.
The table shows the monthly price of onions, in rupees per kilogram, at these two towns for the entire year.
For Yahapur: January — 25 rupees, February — 24 rupees, March — 26 rupees, April — 28 rupees, May — 30 rupees, June — 35 rupees, July — 39 rupees, August — 43 rupees, September — 49 rupees, October — 56 rupees, November — 59 rupees, December — 44 rupees.
For Wahapur: January — 19 rupees, February — 17 rupees, March — 23 rupees, April — 30 rupees, May — 38 rupees, June — 35 rupees, July — 42 rupees, August — 39 rupees, September — 53 rupees, October — 60 rupees, November — 52 rupees, December — 42 rupees.
Now, students, where are onions costlier? Let's see what different students think.
Khushboo says, "I think Wahapur is costlier because it has the highest price of 60 rupees." Khushboo is looking at the maximum price.
Nafisa says, "I added the prices of all months in each location — Yahapur's total is 458, whereas Wahapur's total is 450." Nafisa is looking at the total sum.
Vishal says, "Wahapur is costlier since it has 3 numbers in the 50s." Vishal is looking at how many prices fall in a certain range.
Sampat says, "I compared the prices in each month in both locations. Prices in Yahapur are higher for 6 months, prices in Wahapur are higher for 5 months, and the prices are the same for 1 month. So, I feel Yahapur is costlier." Sampat is doing a month-by-month comparison.
Jithin says, "I noticed that the difference between the highest and lowest prices in Yahapur is 59 minus 24 equals 35, and in Wahapur it is 60 minus 17 equals 43." Jithin is looking at the range or spread of the data.
So you see, students, data can be described and compared by referring to its minimum value, maximum value, the average value, the total sum of all its values, and the difference between the maximum and minimum values, which we call the range.
Now, let's think about how we can visualize this data to understand it better. One way is called a dot plot. Dot plots show data points as dots on a line, helping us visualize variability and patterns in data.
In a dot plot for Yahapur, each dot represents the monthly price of onions. The horizontal line shows the prices from 10 to 60 (instead of starting from 0 as there are no values from 0 to 10 or above 60). The dots on the vertical line give the number of occurrences of a data value. Notice the equal spacing between the units along the horizontal as well as the vertical lines.
Similarly, for Wahapur, we have dots representing the monthly prices.
Now, students, does this visualization capture all the data presented in the tables earlier? Well, it shows us the distribution of prices, but it loses the original month-wise sequence of the values. However, it allows us to group the data however we wish. For instance, there are 2 data values between 11 and 20 for Wahapur, while Yahapur has none. This representation makes it easier to observe the variation in the data — where and how the data is clustered or spread out. We can easily see that the prices in Wahapur are more spread out than those in Yahapur. It is also easy to spot the highest and lowest values.
Now, can we also use the average as one of the ways to compare the prices at these two places? Let's calculate the average price of onions at Yahapur and Wahapur.
For Yahapur, the total is 458 rupees, and there are 12 months, so the average is 458 divided by 12, which is approximately 38.17 rupees.
For Wahapur, the total is 450 rupees, and there are 12 months, so the average is 450 divided by 12, which is 37.5 rupees.
So on average, onions are slightly costlier in Yahapur than in Wahapur.
Now, students, looking at variations in data like the prices of onions over a year in Yahapur and Wahapur can spark our curiosity. For example, we might wonder — do the seasons affect the price of onions? Where are these two locations? Are they close to each other or far apart? What are the factors that determine the price of onions? How much do onion prices vary across shops in the same area? What other commodities might have similar patterns? How do the price fluctuations impact farmers, consumers, and the industry?
These are all great questions to explore. Observing and trying to make sense of data can reveal interesting things. It can also trigger our curiosity in different directions.
Now, the Arithmetic Mean is frequently used in statistics, mathematics, experimental sciences, economics, sociology, sports, biology, and diverse disciplines as a representative of data. It is popular partly because the definition of the arithmetic mean is simple and easy to understand.
Here are some statements involving averages in different scenarios:
- The average rainfall per day in Jharkhand in the month of July is 37.2 millimeters. - My scooty's average mileage this year is about 45 kilometers per liter. - Wheat yield averages 4.7 tonnes per hectare in Punjab versus 2.9 tonnes per hectare in Bihar. - Smartphone users check their phone 58 times a day on average. - An average Indian citizen generates 0.45 kilograms of waste per day. - 3126 is the average number of Indian long films released annually between 2017 and 2024.
Now, let's think about something important. Does the average always give a reasonable summary of the values in a collection? If not, what is an alternative? Let's find out.
Consider the heights of the family members of Yaangba and Poovizhi.
Yaangba's family heights: 169 cm, 173 cm, 155 cm, 165 cm, 160 cm, and 164 cm.
Poovizhi's family heights: 170 cm, 173 cm, 165 cm, 118 cm, and 175 cm.
Now, let's find the average height of each family. Can we say that Yaangba's family is taller than Poovizhi's family?
First, let's calculate the average height of Yaangba's family. The sum is 169 + 173 + 155 + 165 + 160 + 164, which equals 986. There are 6 family members, so the average is 986 divided by 6, which is approximately 164.3 cm.
Now for Poovizhi's family. The sum is 170 + 173 + 165 + 118 + 175, which equals 801. There are 5 family members, so the average is 801 divided by 5, which is 160.2 cm.
So the average height of Poovizhi's family (160.2 cm) is less than that of Yaangba's family (164.3 cm). Although most members in Poovizhi's family are taller, their family's average height is less because one child is much younger and not as tall as the rest of the family. Their average height, 160.2 cm, is less than the heights of 4 out of 5 members. Here, the average doesn't seem to represent the data very well.
In this case, can you think of any other number that can represent the data better?
One way is to sort the data and pick the number in the middle. This number is called the median.
To find the median height of Poovizhi's family, we first sort the heights: 118, 165, 170, 173, 175. The middle number in this sorted data is 170. Therefore, the median height is 170 cm.
Now let's find the median height of Yaangba's family. Sorting the heights, we get: 155, 160, 164, 165, 169, 173. Since the median is the number in the middle, it will have an equal number of values less than it and greater than it. This data does not have a single middle number because it has an even number of values (6). In such cases, we take the average of the two middle numbers in the sorted data. The two middle numbers are 164 and 165. So the median height of Yaangba's family is (164 + 165) divided by 2, which is 164.5 cm.
So for Yaangba's family, the mean is 164.3 cm and the median is 164.5 cm — they are very close to each other.
For Poovizhi's family, the mean is 160.2 cm and the median is 170 cm — they are quite different!
In this case, does the median represent the heights of the families better than the average? Yes, it does! In Poovizhi's family, the height of the youngest child (118 cm) is quite different from the heights of the rest of the family. We call such a value an outlier. Outliers are values which significantly deviate from the rest of the values in the data.
Notice how the mean and the median are close to each other in Yaangba's data, in the absence of any outlier. In Poovizhi's data, because of the outlier, the mean is much lower than the median.
Now, students, what would happen if we removed the outlier from Poovizhi's data? Let's try finding the mean and median without the outlier value 118. The remaining heights are 170, 173, 165, and 175. The sum is 683, and the mean is 683 divided by 4, which is 170.75 cm. The median would be the average of 165 and 173, which is (165 + 173) divided by 2, equals 169 cm. Now the mean and median are much closer! This shows how much the outlier was affecting the mean.
Now, let's look at another example. After the summer vacation, a class teacher asked his class how many short stories they had read. Each student answered the number of stories read on a piece of paper. The data values are: 0, 5, 10, 15, 20, 25, 30, 35, 40.
Now, we need to find the mean and median number of short stories read. Before calculating them, can you guess whether the mean will be less than or greater than the median?
Let's calculate. First, the sum is 0 + 5 + 10 + 15 + 20 + 25 + 30 + 35 + 40, which equals 180. There are 9 values, so the mean is 180 divided by 9, which is 20.
Now for the median — we need to sort the data, but it's already in order: 0, 5, 10, 15, 20, 25, 30, 35, 40. The middle value is the 5th value, which is 20. So the median is also 20.
Now, which of the values would you consider an outlier? Well, 40 is quite high compared to the other values — it could be an outlier at the higher end. And 0 could also be considered an outlier at the lower end if most students read some stories.
Now, what if we remove the outlier 40? Then the data would be 0, 5, 10, 15, 20, 25, 30, 35. The sum would be 140, and the mean would be 140 divided by 8, which is 17.5. The median would be the average of the 4th and 5th values, which is (15 + 20) divided by 2, equals 17.5. So without the outlier, the mean and median are both 17.5.
The average may not always be an appropriate representative of data that has outliers. A very high or a very low outlier can significantly impact the sum, thus affecting the average. In these cases, we saw that the median was not affected much by the outliers.
Now, let's look at another example. Do you read newspapers? Have you noticed how many pages a newspaper has on different days of the week — is it the same or different?
The list below shows the number of pages for a particular newspaper from Monday to Sunday: 16, 18, 20, 22, 26, 16, 10.
Let's find the mean and median. First, the sum is 16 + 18 + 20 + 22 + 26 + 16 + 10, which equals 128. There are 7 values, so the mean is 128 divided by 7, which is approximately 18.29.
For the median, we sort the data: 10, 16, 16, 18, 20, 22, 26. The middle value is the 4th value, which is 18. So the median is 18.
Now, in these three examples — the heights, the short stories, and the newspaper pages — observe the variability in data when:
(a) the mean and median are close to each other — this happens when the data is more balanced or uniformly spread out, without significant outliers.
(b) the mean and median are comparatively far apart, with mean less than median — this happens when there is an outlier on the lower end of the data. The mean gets pulled down by the low outlier.
(c) the mean and median are comparatively far apart, with mean greater than median — this happens when there is an outlier on the higher end of the data. The mean gets pulled up by the high outlier.
Mean and Median are called measures of central tendency, that is, the tendency of the values to pile up around a particular value. In other words, they represent the "centre" of the data.
Now, let's look at another interesting example. Suppose you are asked the question, "How tall is your class?" What would you say?
The table shows the heights of students in a Grade 5 class in centimeters.
For boys: 147, 135, 130, 154, 128, 135, 134, 158, 155, 146, 146, 142, 140, 141, 144, 145, 150.
For girls: 143, 136, 150, 144, 154, 140, 145, 148, 156, 150, 150.
Now, we can visualize the data using a dot plot, identify the ends and patterns, and look at the variability. We can also find the measures of central tendency.
For the whole class, the mean height is 144.4 cm and the median is 145 cm.
For the boys, the mean height is 142.94 cm and the median is 144 cm.
For the girls, the mean height is 146.9 cm and the median is 148 cm.
Now, what can we infer from the dot plots and the central tendency measures?
First, the boys' heights are more spread out and are between 128 and 158 cm. The girls' heights lie between 136 and 156 cm. Both the tallest and shortest in the class are boys.
Yet, the boys' average height is less than the whole class average, and also less than the girls' average height. We can say girls are taller than boys in this class. Of course, this doesn't mean every girl is taller than every boy!
For boys' heights, mean is less than median (142.94 < 144), indicating a small influence of values on the lower side. For girls' heights too, mean is less than median (146.9 < 148), indicating a small influence of values on the lower side.
Now, here's an interesting question — how many students are taller than the class' average height? Since the average is 144.4 cm, and the median is 145 cm, about half the students would be taller than this value. Similarly, how many boys are taller than the class' average height? Again, about half.
Now, let's look at another fun activity. Two groups of children were asked to estimate the length of 1 minute. They start by closing their eyes and then open when they think 1 minute has passed. Of course, they are not supposed to count while their eyes are closed. The dot plots show after how many seconds the children opened their eyes.
For Group A, the mean is 58.21 seconds and the median is 60 seconds.
For Group B, the mean is 59.28 seconds and the median is 59.5 seconds.
Now, let's discuss how well both the groups fared at this activity. Describe and compare the variability in data and their central tendency.
Both groups have means and medians close to 60 seconds, which is the actual length of 1 minute. This shows that, on average, the children were quite good at estimating 1 minute. Group B has a slightly lower mean and median, meaning they tended to underestimate the minute slightly. Group A has a median exactly at 60, which is perfect!
Now, here's an interesting cricket example. In a cricket match, can a team's median runs scored by a player be 0 but the team's total score be 407 for 10? Yes, it can! In yesterday's match, the median runs scored by England's players was 0, and yet the team scored 407 for the loss of 10 wickets.
How is this possible? Well, if most players scored 0 or very low scores, but a few players scored very high scores, the median could be 0 while the total is high. Accounting for the extras, the average runs scored by a player in this innings is (407 minus 19) divided by 11, which equals 35.27. So the average is much higher than the median because of the high-scoring players.
Now, students, it's important to understand the difference between 0 and no value. Suppose a player scores 57, 13, 0, 84, did not play, 51, 27 in a series. Notice that the player played Match 3 and scored 0 runs, whereas the player did not play Match 5. So we consider the total number of matches to be 6 and not 7. We calculate their average runs scored per match as (57 + 13 + 0 + 84 + 51 + 27) divided by 6.
This is an important distinction — 0 means the player played and scored nothing, while "did not play" means we don't include that match in the count.
Now, here's a historical example that shows how the concept of mean was used in real life. In the early 1500s in Europe, the basic unit of land measurement was the rod, defined as 16 feet long. At that time, a foot meant the length of a human foot! But foot sizes vary, so whose foot could they measure? To solve this, 16 adult males were asked to stand in a line, toe to heel, and the length of that line was considered the 16-foot rod. After the rod was determined, it was split into 16 equal sections, each representing the measure of a single foot. In essence, this was the arithmetic mean of the 16 individual feet, even though the term "mean" was not mentioned anywhere. So you see, the concept of average has been used for a very long time, even before it had a formal name!
Now, let's move on to Section 5.3, which is called "Visualising Data."
We can often understand data more clearly if it is presented as a picture. This is called data visualisation. Last year, we saw how to visualize data using graphs. Let us explore visualisation further.
Earlier, we looked at the monthly onion prices in Yahapur and Wahapur. We can represent this data using column graphs. Two column graphs for this data are given — one for Yahapur and one for Wahapur.
The two graphs can also be combined into a single graph. We just draw the bars side by side! This is called a clustered column graph. Since it has two columns in each cluster, we also call it a double column graph.
In this graph, we use different colors to clearly separate the data from the two places. The relative heights of the bars tell us where onions are costlier in each month. We can also visually estimate the difference by referring to the markings along the vertical line.
The dots and slanted lines within the bars help people who find difficulty in distinguishing colors. It is also useful when things are printed in greyscale, which means black and white.
Now, let's look at another example involving rocket launches. You might have heard about scientific probes like Chandrayaan-3 launched in 2023 by ISRO or the Voyager-1 launched in 1977 by NASA, observational satellites like Aryabhata launched in 1975 by ISRO or Sputnik-1 launched in 1957 by the Soviet Space program, or about human spaceflights to the International Space Station. All space missions are launched using rockets.
The graph shows the number of worldwide rocket launches by different organizations for the years 2021, 2022, and 2023. The organizations include SpaceX, CASC, Roscosmos, Arianespace, Rocket Lab, United Launch Alliance, ISRO, Galactic Energy, Expace, and Others.
Often there is a lot of information in graphs, and it may be difficult to understand. We can follow a 2-step process to simplify making sense of the data in graphs.
Step 1: Identify what is given. Notice how the graph is organized, what scale is used, and what patterns the data shows.
For each organization, the numbers of rocket launches for the years 2021, 2022, and 2023 are shown as three adjacent bars. The scale used is 1 unit length equals 20 rockets. Notice the numbers at the bottom. The "Others" category indicates multiple organizations worldwide that are clubbed together to keep the graph short. Note that in the double bar graph of onion prices, the months are shown in order, that is, January to December, to observe the change over time, whereas in this case, a change in the order of organizations does not affect the meaning.
Step 2: Infer from what is given. Analyse and interpret each of your observations.
- We can say that the USA, China, and Russia are the leading rocket launching countries in the given time period. - SpaceX launched about twice the number of rockets in 2022 compared to 2021. And it launched about 35 more rockets in 2023 compared to 2022. - The number of rockets launched by Arianespace decreased every year. - United Launch Alliance launched more rockets in 2022 than in 2021. They launched fewer rockets in 2023 than in both the years 2022 and 2021. - Other organizations launched about 25 rockets in 2023.
Now, let's look at another interesting example involving daylight hours in two cities. The tables show data related to weather in two cities in different countries. The numbers given are in hours.
City 1: January — 210 hours, February — 257 hours, March — 372 hours, April — 441 hours, May — 536 hours, June — 564 hours, July — 555 hours, August — 465 hours, September — 394 hours, October — 310 hours, November — 222 hours, December — 186 hours.
City 2: January — 459 hours, February — 384 hours, March — 381 hours, April — 327 hours, May — 304 hours, June — 276 hours, July — 295 hours, August — 318 hours, September — 369 hours, October — 409 hours, November — 435 hours, December — 468 hours.
Can you guess what the data might be related to?
The data shows the monthly hours of daylight, that is, the Sun is at least partly above the horizon, in these two cities over the year.
Based on this data, a clustered bar graph showing the average daylight hours per day in each month is given. This average is obtained by dividing the monthly daylight hours by the number of days in the month.
Let's follow the 2-step process again.
Step 1: Identify what is given. Notice how the graph is organized, what scale is used, and what patterns the data shows. The horizontal line shows the months of the year. The vertical line shows the average daylight hours per day, using the scale 1 unit equals 5 hours. The month of June has the maximum value for City 1 and the minimum value for City 2.
Step 2: Infer from what is given. Analyse and interpret each of your observations.
- The average number of daylight hours per day in City 1 increases from January, reaching a maximum of about 17 to 18 hours in June. It then decreases, reaching a minimum of about 6 hours in December.
- The average number of daylight hours per day in City 2 decreases from January, reaching a minimum of about 9 hours in June. It then increases, reaching a maximum of about 15 hours in December.
- The maximum and minimum values in City 1 are more extreme than those of City 2. That is, the maximum number of daylight hours per day of City 1 is more than that of City 2, and the minimum number of daylight hours per day of City 1 is less than that of City 2.
- In June, City 1 experiences daylight for about three-quarters of the full day (24 hours), whereas during December to January, it only experiences daylight for about one-quarter of the full day.
Does this give some idea of where these two cities are located?
City 1 and City 2 are located away from the Equator in the Northern and Southern hemispheres, respectively. City 1 is Helsinki, Finland, and City 2 is Wellington, New Zealand. In June, the Northern Hemisphere is tilted towards the Sun, resulting in longer daylight hours; it is summertime here. Meanwhile, the Southern Hemisphere is tilted away from the Sun, leading to shorter days; it is wintertime here. The inverted seasonal daylight pattern is due to the cities' location in opposite hemispheres. The large variation in the data is because they are away from the Equator.
Now, let's look at a cricket match graph. Have you ever missed watching a cricket match? You can catch up in a minute by looking at a graph. The horizontal line lists the overs starting from 1, and the vertical line indicates the runs scored in each over. The graph shows the number of runs scored per over as a double bar graph — each bar corresponding to a team. Let us call them the blue team and the red team. The scale used for the runs per over is 1 unit equals 5 runs. The circles shown on top of the bars indicate that a wicket fell in that over.
Based on this graph, we can answer questions like: Can we tell who batted first? Who won the match? How many runs did the blue team score in over 12? In which over did the red team score the least number of runs? Is it easy to tell the target set by the team batting first?
Now, let's move on to Section 5.4, which is called "Data Detective."
We put well-formed sentences one after the other to make a beautiful story. In the same way, well-organized and well-presented data can tell interesting stories, and can also expose new mysteries or help solve mysteries!
Earlier, we saw data of two Grade 5 classrooms with heights of boys and girls in each class. There, the average height of girls was more than boys in one class and vice versa in the other class.
Now, let's look at some data based on a survey of the heights of boys and girls of different ages in India over time. The following table shows the average heights of boys and girls in centimeters across ages 5 to 19 in the years 1989, 1999, 2009, and 2019. In each year, the first column shows boys' heights and the second column shows girls' heights.
Let me share some of this data with you:
For age 5: In 1989, boys were 101.3 cm and girls were 100 cm. In 1999, boys were 102.4 cm and girls were 101.7 cm. In 2009, boys were 105.1 cm and girls were 104 cm. In 2019, boys were 107.1 cm and girls were 107.2 cm.
For age 10: In 1989, boys were 127.5 cm and girls were 127.3 cm. In 1999, boys were 128.3 cm and girls were 127.8 cm. In 2009, boys were 129.4 cm and girls were 129.9 cm. In 2019, boys were 132.6 cm and girls were 132.8 cm.
For age 15: In 1989, boys were 155.4 cm and girls were 148.5 cm. In 1999, boys were 155.2 cm and girls were 148.4 cm. In 2009, boys were 156.3 cm and girls were 150.1 cm. In 2019, boys were 159 cm and girls were 152.4 cm.
Spend sufficient time observing the data presented in this table. Share your findings with the class.
Now, let's think about some prompts for investigation:
- Changes in the heights of boys or girls of a certain age from 1989 to 2019. - The heights of boys versus girls at different ages in a particular year. - Changes in height between successive ages in boys and girls in 2019.
Now, let's check which of the following statements can be justified using the data:
1. "The average heights of both boys and girls at every age increased from 1989 to 2019." Is this true? Looking at the data, yes, it seems to be true for most ages. For example, at age 5, boys went from 101.3 cm to 107.1 cm, and girls went from 100 cm to 107.2 cm.
2. "The average height of 13-year-old girls in 1989 is more than the average height of 14-year-old girls in 2009." Let's check. In 1989, 13-year-old girls were 143.2 cm. In 2009, 14-year-old girls were 148 cm. So this statement is false.
3. "The average height of 15-year-old boys in 2019 is more than the average height of 16-year-old boys in 1989." In 2019, 15-year-old boys were 159 cm. In 1989, 16-year-old boys were 158.9 cm. So yes, this statement is true!
4. "All girls aged 13 are taller than all girls aged 11." This is not necessarily true. The data shows averages, which means some 13-year-old girls might be shorter than some 11-year-old girls. The statement says "all," which is too strong.
5. "Throughout the age period 5 to 19, the average boy's height is more than the average girl's height." Looking at the data, for most ages, this is true. However, at age 11 in 2009, girls were 135.7 cm and boys were 133.7 cm — girls were taller on average! So this statement is not always true.
6. "Boys keep growing even beyond age 19." The data shows that from age 17 to 19, boys' heights increase, so it's likely they continue growing beyond 19, but we don't have data to confirm this exactly.
Now, here's another question: In 2019, between which two successive ages from 5 to 19 did boys grow the most? Between which two successive ages from 5 to 19 did girls grow most?
Looking at the data, we can calculate the difference in height between successive ages. For boys in 2019, the biggest jump seems to be between ages 14 and 15, where they grew from 154.4 cm to 159 cm, a growth of 4.6 cm. For girls in 2019, the biggest jump seems to be between ages 11 and 12, where they grew from 138.6 cm to 143.8 cm, a growth of 5.2 cm.
Now, suppose the average height of a newborn is 50 cm. Estimate the average height of young children of ages 1 to 4. We would need additional data for this, but we can see from the table that at age 5, the average height is around 107 cm. So for ages 1 to 4, it would be progressively less than 107 cm.
Based on the trend observed in the table, write your estimates of the heights of boys and girls for ages 5 to 19 in the year 2029. We can see that heights have been increasing over time, so we might expect them to be slightly higher in 2029 than in 2019.
Now, students, always remember — whenever you see data or some graph, look closely to know the story it has to say and the mysteries it may hold.
Now, let's look at one more example involving pockets on clothing. The dot plots show the distribution of the number of pockets on clothing for a group of boys and for a group of girls.
For boys, the data seems to be more spread out. For girls, the data seems to be more clustered. Based on the dot plots, we can determine which statements are true:
(a) "The data varies more for the boys than for the girls." This seems to be true based on the dot plot.
(b) "The median number of pockets for the boys is more than that for the girls." We would need to calculate the medians to confirm this.
(c) "The mean number of pockets for the girls is more than that for the boys." Again, we would need to calculate the means.
(d) "The maximum number of pockets for boys is greater than that for the girls." This seems to be true from the dot plot.
Now, let's look at another example involving sports preferences in a colony. The table shows favorite sports and the number of people who watch and who participate in each sport.
We can choose an appropriate scale and draw a double-bar graph to visualize this data and write down our observations.
Now, students, we have covered a lot of important concepts in this chapter. Let me now give you a complete summary of everything we have learned.
In this chapter on "Connecting the Dots," we learned about statistics and how to work with data.
First, we learned about statistical statements and statistical questions. A statistical statement is a claim or summary about some phenomenon, expressed in terms of numerical values, proportions, probabilities, or predictions. A statistical question is a question that can be answered by collecting data, where we expect variation in the answers.
Then, we learned about representative values. The Arithmetic Mean, or average, is calculated as the sum of all values divided by the number of values. It represents the fair-share or equal-share value if the total were distributed equally among all members. We also learned about the Median, which is the middle value in a sorted set of data. For an even number of values, the median is the average of the two middle values.
We learned that the mean and median are both measures of central tendency — they tell us where the center of the data lies. We also learned about outliers — values that significantly deviate from the rest of the data. When there are outliers, the median is often a better representative of the data than the mean, because outliers can pull the mean towards them.
We learned about different ways to describe and compare data: using the minimum value, maximum value, total, range (difference between maximum and minimum), arithmetic mean, and median.
We learned about data visualization using dot plots and clustered bar graphs. Dot plots help us see the distribution and spread of data, while bar graphs help us compare values across categories or over time.
We learned a 2-step process for reading graphs: first, identify what is given by noticing the organization, scale, and patterns; second, infer from what is given by analyzing and interpreting the observations.
We learned how to be data detectives — looking at data carefully to find interesting patterns, make comparisons, and generate new questions for further investigation.
We saw examples from real life, including cricket scores, onion prices, family heights, school enrollments, rocket launches, daylight hours in different cities, and the heights of boys and girls over time in India.
We learned that statistics is not just about numbers — it's about telling stories with data, making decisions, and understanding the world around us better.
So, students, I hope you enjoyed this lesson on "Connecting the Dots." Remember, data is everywhere, and by learning to collect, organize, analyze, and interpret data, you are developing important skills that will help you throughout your life. Keep asking questions, keep exploring, and keep connecting the dots!
Thank you for listening so attentively. See you next time!