- Population: The entire group of individuals or items that you want to study or draw conclusions about.
- Sample: A subset of the population, selected for analysis.
- Parameter: A numerical value that describes a characteristic of a population, such as the population mean (μ) or variance (σ²).
- Statistic: A numerical value calculated from a sample, such as the sample mean (x̄) or sample variance (s²).
- Mean (Arithmetic Mean): The sum of all values divided by the number of values.
- Median: The middle value when data is sorted; if even number of values, the average of the two middle ones.
- Mode: The most frequently occurring value in a data set.
- Variance: The average of the squared differences from the mean; measures data spread.
- Standard Deviation: The square root of the variance; represents typical distance from the mean.
- Range: The difference between the maximum and minimum values.
- Interquartile Range (IQR): The difference between the third quartile (Q3) and the first quartile (Q1); measures middle 50% spread.
- Skewness: A measure of asymmetry in the distribution of data.
- Kurtosis: A measure of the “tailedness” of the distribution.
- Probability: A number between 0 and 1 representing the likelihood of an event occurring.
- Random Variable: A variable that takes on values based on the outcome of a random event.
- Discrete Variable: A variable that can take on a finite or countable number of values.
- Continuous Variable: A variable that can take on any value within an interval.
- Probability Distribution: A function that describes the likelihood of each outcome for a random variable.
- Probability Mass Function (PMF): Gives probabilities for discrete random variables.
- Probability Density Function (PDF): Describes the relative likelihood of a continuous random variable.
- Cumulative Distribution Function (CDF): Gives the probability that a random variable is less than or equal to a given value.
- Expected Value (Mean): The long-run average value of repetitions of the experiment it represents.
- Moment: A quantitative measure related to the shape of a distribution (e.g., mean is the first moment).
- Central Moment: A moment calculated about the mean (e.g., variance is the second central moment).
- Estimator: A rule or formula for calculating an estimate of a population parameter based on sample data.
- Estimate: A specific numerical value obtained from applying an estimator to a sample.
- Bias (of an Estimator): The difference between the expected value of the estimator and the true parameter value.
- Unbiased Estimator: An estimator whose expected value equals the true parameter value.
- Efficiency: An estimator is more efficient if it has lower variance among unbiased estimators.
- Consistency: An estimator is consistent if it converges to the true value as sample size increases.
- Sufficiency: A statistic is sufficient if it captures all the information in the data about a parameter.
- Confidence Interval: A range of values, derived from the sample, that is likely to contain the population parameter.
- Confidence Level: The proportion of confidence intervals, constructed from repeated samples, that would contain the parameter.
- Hypothesis Testing: A formal procedure for testing a claim about a population parameter.
- Null Hypothesis (H₀): The default assumption or claim to be tested.
- Alternative Hypothesis (H₁): The rival claim, tested against the null hypothesis.
- p-value: The probability of observing the test statistic or something more extreme under the null hypothesis.
- Significance Level (α): The threshold below which the null hypothesis is rejected (commonly 0.05).
- Type I Error: Rejecting the null hypothesis when it is actually true.
- Type II Error: Failing to reject the null hypothesis when it is actually false.
- Power (of a Test): The probability of correctly rejecting a false null hypothesis.
- Test Statistic: A function of the sample data used to decide whether to reject H₀.
- T-distribution: A distribution used for inference when population variance is unknown and sample size is small.
- Chi-square Distribution: Used in tests for variance and for categorical data (e.g., goodness-of-fit).
- F-distribution: Used in analysis of variance (ANOVA) and regression testing.
- Correlation: A measure of linear relationship between two variables (ranges from –1 to 1).
- Covariance: A measure of how two variables change together; not standardized like correlation.
- Regression: A method for modeling the relationship between a dependent variable and one or more independent variables.
- Simple Linear Regression: Regression with one independent variable.
- Multiple Linear Regression: Regression with two or more independent variables.
- Residual: The difference between an observed value and the value predicted by a model.
- Homoscedasticity: The assumption that residuals have constant variance.
- Heteroscedasticity: When residuals have non-constant variance.
- Multicollinearity: A situation in regression when independent variables are highly correlated with each other.
- Bootstrap: A resampling method used to estimate the distribution of a statistic by sampling with replacement.
- Permutation Test: A nonparametric method to test hypotheses by rearranging labels in the data.
- Bayesian Inference: Statistical inference that updates beliefs about a parameter using Bayes’ Theorem.
- Prior Distribution: In Bayesian analysis, the distribution expressing beliefs about a parameter before seeing the data.
- Posterior Distribution: The updated distribution of a parameter after observing data.
- Likelihood Function: A function of the parameter given the data, used for estimation (e.g., in Maximum Likelihood Estimation).
- Maximum Likelihood Estimation (MLE): A method of estimating parameters by maximizing the likelihood function.
- Sampling Distribution: The probability distribution of a statistic over all possible samples.
- Degrees of Freedom: The number of values in a calculation that are free to vary.
- Outlier: A value far from the center of the data; may indicate variability or data issues.
- Z-score: The number of standard deviations a data point is from the mean.
- Normal Distribution: A symmetric, bell-shaped distribution that arises frequently in statistics.
- Standard Normal Distribution: A normal distribution with mean 0 and standard deviation 1.
- Uniform Distribution: A distribution where all outcomes are equally likely over an interval.
- Exponential Distribution: A distribution used to model time between independent events.
- Poisson Distribution: A discrete distribution used to model the count of events in a fixed interval.
- Binomial Distribution: A discrete distribution describing the number of successes in a fixed number of independent trials.
- Bernoulli Distribution: The simplest discrete distribution, with only two outcomes: success (1) or failure (0).
- Geometric Distribution: Describes the number of trials until the first success.
- Hypergeometric Distribution: Describes successes in a sample drawn without replacement.
Statistics Definitions

Discover more from SodakAI: Bespoke AI Solutions
Subscribe to get the latest posts sent to your email.
