1 min read

beta系数理解

A function to do pairs bootstrap | Python

笔记还没有整理,关键是链接中的内容,可以点击标题查看,关键的实现代码,我摘录过来了。

def draw_bs_pairs_linreg(x, y, size=1):
    """Perform pairs bootstrap for linear regression."""

    # Set up array of indices to sample from: inds
    inds = np.arange(len(x))

    # Initialize replicates: bs_slope_reps, bs_intercept_reps
    bs_slope_reps = np.empty(size)
    bs_intercept_reps = np.empty(size)

    # Generate replicates
    for i in range(size):
        bs_inds = np.random.choice(inds, size=len(inds))
        bs_x, bs_y = x[bs_inds], y[bs_inds]
        bs_slope_reps[i], bs_intercept_reps[i] = np.polyfit(bs_x, bs_y, 1)

    return bs_slope_reps, bs_intercept_reps

Pairs bootstrap of literacy/fertility data | Python

# Generate replicates of slope and intercept using pairs bootstrap
bs_slope_reps, bs_intercept_reps = draw_bs_pairs_linreg(illiteracy, fertility,size=1000)

# Compute and print 95% CI for slope
print(np.percentile(bs_slope_reps, [2.5,97.5]))

# Plot the histogram
_ = plt.hist(bs_slope_reps, bins=50, normed=True)
_ = plt.xlabel('slope')
_ = plt.ylabel('PDF')
plt.show()

理解了,为什么\(\beta\)理解起来是一个distribution。 实际上,bootstrap的过程, 使得\(\beta\)出来了,并且\(\beta\)作为一个statistics, 当bootstrap的过程无穷大时,近似 是满足正态分布或者t分布的,因此可以直接用t分布的参数解决问题。