Thursday, February 28, 2013

Normal Distribution and giving the business user an ability to use 95% Confidence Interval



Assuming that your data follows Normal Distribution, if you are presented with a business case to present the numbers with 95% Confidence Interval, you could take the following approach.

Sample size < 5 – Ignore Perhaps a message that we do not have enough data is useful here. This is because in the statistical world, you ignore any sample size less than 5 as the data is highly volatile and cannot be presented with confidence.

You could bucket sample sizes from 6 to 29 into 3 or more categories depending on your business’ need for precision. Below is an example where we categorized into 3 buckets. The values highlighted in yellow are taken from a Statistical table approximating the value for different buckets for a 95% Confidence Interval and could be reused for your use case.

Lower Threshold Value
=SWITCH(
Fields!SampleSize.Value <= 5, “Not enough Data”
       , Fields!SampleSize.Value > 5 AND Fields!SampleSize.Value <= 10  , ROUND(Fields!SampleMean.Value - (2.36 * Fields!StdDev.Value/Fields!Sqrt_SampleSize.Value),2)
       , Fields!SampleSize.Value > 10 AND Fields!SampleSize.Value <= 20  , ROUND(Fields!SampleMean.Value - (2.16 * Fields!StdDev.Value/Fields!Sqrt_SampleSize.Value),2)
       , Fields!SampleSize.Value > 20 AND Fields!SampleSize.Value <= 30  , ROUND(Fields!SampleMean.Value - (2.06 * Fields!StdDev.Value/Fields!Sqrt_SampleSize.Value),2)
       , Fields!SampleSize.Value > 30, ROUND(Fields!SampleMean.Value - (1.96 * Fields!StdDev.Value/Fields!Sqrt_SampleSize.Value),2)
)

Upper Threshold Value
=SWITCH(
Fields!SampleSize.Value <= 5, “Not enough Data”
, Fields!SampleSize.Value > 5 AND Fields!SampleSize.Value < 10  , ROUND(Fields!SampleMean.Value + (2.36 * Fields!StdDev.Value/Fields!Sqrt_SampleSize.Value),2)
       , Fields!SampleSize.Value > 10 AND Fields!SampleSize.Value < 20  , ROUND(Fields!SampleMean.Value + (2.16 * Fields!StdDev.Value/Fields!Sqrt_SampleSize.Value),2)
       , Fields!SampleSize.Value > 21 AND Fields!SampleSize.Value < 30  , ROUND(Fields!SampleMean.Value + (2.06 * Fields!StdDev.Value/Fields!Sqrt_SampleSize.Value),2)
       , Fields!SampleSize.Value > 30, ROUND(Fields!SampleMean.Value + (1.96 * Fields!StdDev.Value/Fields!Sqrt_SampleSize.Value),2)
)

So the 95% Confidence Interval for your data would be the range between “Lower Threshold Value and Upper Threshold Value”.