What is the name of your dataset?
Femalehealth dataset. The dataset represents female information includingvariables like weight, height, waist length, pulse rate etc. togetherwith the month of birth of their firstborns.
Choose and write down the name of any quantitative and continuous variable from your dataset and use Excel to calculate the mean, median, mode, min, max, range, variance, and standard deviation for that variable. Paste the results into your main post (or use Add/Remove to attach them).
Fromthe female health dataset, a quantitative and continuous variablewould be any measurable quantity that has an infinite number ofpossible values. Therefore, the quantity can be indicated in decimalform. From the dataset, the continuous variables include height,weight, waist, BMI, wrist and body temperature.
Inthis case I choose Heightas my quantitative and continuous variable.
UsingExcel data analysis tool, the mean, median, mode, min, max, range,variance, and standard deviation for height are in SHEET2 of the attached exceldocument.
Which measure of centre (mean, median, or mode) does the best job in describing your variable and why?
Indescribing height, mean does the best job since it provides the bestcentral location. Mean as a measure is very essential since it is themost common and minimises the error. The error is minimised due tothe fact that all the values are being catered for during calculation(Weed, 1979). Ithas best described the height variable since its measure has includedall the values of female heights in the calculation and in the caseof any changes, the mean will be affected. Where this is not thesimilar case with mode or median.
Which measure of centre (mean, median, or mode) does the worst job in describing your variable and why?
Indescribing height, mode does the worst job-thereason is that, if I use mode to describe the central tendency ofthis dataset it would give misleading information due to the factthat it only represents a small number of individuals of the entiredataset.It is important to use a measure that would put into considerationall the values of the individuals in the data set (Weed,1979).
Include (type in) the value 50000 into your selected variable data. Now, recalculate the mean, median, and the variance for your variable (now that it includes the extra value of 50000). How did this change the mean, median, and variance from their original values?
Theadjustments of the descriptive summary statistics are in SHEET3of the Excel document. The changes were as follows:
New (after including 50000)
Isthe change what you expected?
Thechange is exactly what I expected. 50,000 is an extreme large valuethat can be referred to as an outlier. The mean being an arithmeticaverage of all the values of the individuals in a data set, it isstrongly affected by any outlier (extreme observation) that happensto be introduced or included in the data set (Weed,1979).This explains the drift from 63.195 to 1281.165854after the introduction of 50000 in the set.
Themedian was slightly affected since it changed from 63.35 to 63.4 andin many cases it is resistant to any outlier. This explains why thechange was very minimal, almost being negligible.
Asfor the variance, an outlier will strongly affect it since itmeasures variability of the values within the data set. The varianceshifted from 7.51 to 60821580.34 because the new value was extremelylarge. As a result it increased the variation between the heightvalues.
Choose any two quantitative variables from your dataset.
Thetwo quantitative variables that I chose from the female health dataset include weight and waist.
Threenumerical measures to compare these two variables.
Next,create a bar graph that includes the mean and median for bothvariables (to visualize their comparison).
Weed, H. D.(1979). Descriptivestatistics.Wentworth, NH: COMPress.