Complete notes on data collection, representation, and measures of central tendency
Statistics is the branch of mathematics that deals with collection, organization, analysis, interpretation, and presentation of data. It helps us make sense of large amounts of information and draw meaningful conclusions.
Statistics is the science of collecting, organizing, analyzing, and interpreting numerical data to make decisions or predictions.
Data: Facts or pieces of information collected for analysis.
| Type | Description | Example |
|---|---|---|
| Primary Data | Data collected directly by the investigator for the first time | Survey, Questionnaire, Interview |
| Secondary Data | Data collected from published or unpublished sources | Census reports, Books, Websites |
A frequency distribution is a table that shows how data is distributed across different values or class intervals.
Q: Marks obtained by 20 students: 5, 8, 5, 6, 7, 8, 5, 9, 6, 7, 8, 5, 6, 7, 8, 9, 5, 6, 7, 8
| Marks | Tally | Frequency |
|---|---|---|
| 5 | |||| | 5 |
| 6 | |||| | 4 |
| 7 | |||| | 4 |
| 8 | |||| | 5 |
| 9 | || | 2 |
| Total | 20 |
Class Interval: A range of values in which data is grouped. Example: 0-10, 10-20
Class Size/Width: Difference between upper and lower class limits. Class size = Upper limit - Lower limit
Class Mark: Mid-value of a class. Class Mark = (Lower limit + Upper limit) / 2
Class Limits: The end values of a class interval (Lower and Upper limits)
Q: Marks of 30 students in a test (out of 50):
| Class Interval (Marks) | Class Mark | Frequency |
|---|---|---|
| 0-10 | 5 | 3 |
| 10-20 | 15 | 5 |
| 20-30 | 25 | 8 |
| 30-40 | 35 | 9 |
| 40-50 | 45 | 5 |
| Total | 30 |
Class Size = 10 - 0 = 10 (for first class)
1. Calculate class marks for each class interval.
2. Plot class marks on X-axis and frequencies on Y-axis.
3. Join the points with straight lines.
4. Close the polygon by joining to the X-axis on both ends.
A measure of central tendency is a single value that represents the center or typical value of a data set. The three main measures are Mean, Median, and Mode.
Mean is the sum of all observations divided by the total number of observations.
Mean (x̄) = Sum of all observations / Number of observations = Σxᵢ / n
Q: Find the mean of: 4, 6, 8, 10, 12
Solution:
Sum = 4 + 6 + 8 + 10 + 12 = 40
Number of observations (n) = 5
Mean = 40/5 = 8
Therefore, Mean = 8
Mean (x̄) = Σfᵢxᵢ / Σfᵢ
Where fᵢ = frequency and xᵢ = observation
Q: Find the mean from the following data:
| x | f | fx |
|---|---|---|
| 5 | 4 | 20 |
| 10 | 6 | 60 |
| 15 | 8 | 120 |
| 20 | 2 | 40 |
| Total | 20 | 240 |
Solution:
Mean = Σfx / Σf = 240/20 = 12
Therefore, Mean = 12
Median is the middle value of a data set when arranged in ascending or descending order.
For odd number of observations (n):
Median = ((n+1)/2)th observation
For even number of observations (n):
Median = [(n/2)th + ((n/2)+1)th observations] / 2
Q: Find the median of: 3, 7, 2, 9, 5
Solution:
Arrange in ascending order: 2, 3, 5, 7, 9
n = 5 (odd)
Median = ((5+1)/2)th = 3rd observation = 5
Therefore, Median = 5
Q: Find the median of: 4, 8, 2, 10, 6, 12
Solution:
Arrange in ascending order: 2, 4, 6, 8, 10, 12
n = 6 (even)
Median = [(6/2)th + ((6/2)+1)th] / 2
Median = (3rd + 4th observations) / 2 = (6 + 8) / 2 = 7
Therefore, Median = 7
Mode is the value that occurs most frequently in a data set.
A data set can have no mode, one mode (unimodal), two modes (bimodal), or more (multimodal).
Q: Find the mode of: 2, 4, 4, 6, 4, 8, 6, 4
Solution:
Frequency of 2 = 1
Frequency of 4 = 4 (highest)
Frequency of 6 = 2
Frequency of 8 = 1
Therefore, Mode = 4
| Measure | Advantages | Disadvantages |
|---|---|---|
| Mean | Uses all data values; unique value | Affected by extreme values (outliers) |
| Median | Not affected by extreme values; good for skewed data | Doesn't use all values; arrangement needed |
| Mode | Easy to find; shows most common value | May not exist; may have multiple modes |
For a moderately skewed distribution:
Mode = 3 × Median - 2 × Mean
This relationship is useful when one measure is difficult to calculate.
Raw Data: x̄ = Σxᵢ / n
Frequency: x̄ = Σfᵢxᵢ / Σfᵢ
Sum divided by count
Odd n: ((n+1)/2)th term
Even n: Average of (n/2)th and ((n/2)+1)th terms
Middle value when arranged
Value with highest frequency
Mode = 3×Median - 2×Mean
Most common value
Class Mark = (Upper + Lower) / 2
Class Size = Upper - Lower
For grouped data
• Always arrange data in order before finding median.
• Calculate cumulative frequency for frequency distribution problems.
• Check your mean calculation by estimation first.
• Remember: Mean is affected by outliers, median is not.
• For grouped data, use class marks as representative values.