0%

统计学名词介绍

介绍

本文主要介绍概率论经常出现的名词和缩略形式

Cumulative distribution function(CDF) 累计分布函数

Probability distribution(PD)

Probability Mass Function(PMF) 概率密度函数

这个和下一个PDF类似,但是用在discrete parameter上。所有可能的组合为1

Probability density function(PDF) 概率密度函数

frequentist inference and Bayesian inference

mean squared error, or MSE

贝叶斯模型

首先是先验概率,假设model为$P[0.1, 0.2, \cdots, 0.9]$的概率值为多少多少.
然后得到Model的likehood函数:$P(data | Model)$
算后验概率,$P(Model | data)$,注意这个对每个model而言的和为1.

从prior到poster就是贝叶斯推断。使用一开始的贝叶斯模型得到的poster概率来预测新的参看第五题

Union Distribution

他的PDF是flat的

Beta Function

Wiki

Beta distribution

In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] parametrized by two positive shape parameters, denoted by α and β, that appear as exponents of the random variable and control the shape of the distribution.
贝叶斯的prior和poster与beta function关系:http://stats.stackexchange.com/questions/58564/help-me-understand-bayesian-prior-and-posterior-distributions
Wiki

Conjugacy

analogous(类似的)
根据观察改变poster

P-value

检验$H_0$假设是否正确。

Poison

Gamma distribution

In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The common exponential distribution and chi-squared distribution are special cases of the gamma distribution. There are three different parametrizations in common use:

  1. With a shape parameter k and a scale parameter θ.
  2. With a shape parameter α = k and an inverse scale parameter β = 1/θ, called a rate parameter.
  3. With a shape parameter k and a mean parameter μ = k/β.

confidence interval

Credible Intervals

The differecen between confidence interval and credible interval

Bayersian -> decision theory

Posterior distribution = prior + data
Bayersian -> decision theory -> loss function

  • linear decision, the loss function is L1 loss, and minimum point is median.
  • square decision, the loss function is L2. And the optimal is mean
  • 0/1, the loss function is L0.
    For the posterior distribution, we can get the decision loss function distribution.

For example, there is two competing hypothesis: $H_1$,$H_2$.

L(d) means decide and the loss of this decision.
Expected losses: $E(L(d)) = \sum P(H_i \vert data)L(d) $

协方差

用来表示两个随机变量关系的统计量。
方差可以写成:

类似,协方差:

结果含义就是正值,两者呈现正相关。负值则相反。
协方差矩阵就是在高维情况下产生的。n维数据就会有$\dbinom{n}{2}$个协方差

统计学三大分布

多项分布

极大似然估计(MLE)

似然函数$L_n(\theta) = \Pi_{i=1}^n f(X_i;\theta)$.
极大似然估计记为$\hat{\theta}_n$, 是使似然函数最大$\theta$的值。

参考

[1] 协方差-WIkipedia: https://en.wikipedia.org/wiki/Covariance_matrix


因为我们是朋友,所以你可以使用我的文字,但请注明出处:http://alwa.info