oranges的数据集结构如下:
dim(oranges)
## [1] 36 6
head(oranges)
## store day price1 price2 sales1 sales2
## 1 1 1 37 61 11.3208 0.0047
## 2 1 2 37 37 12.9151 0.0037
## 3 1 3 45 53 18.8947 7.5429
## 4 1 4 41 41 14.6739 7.0652
## 5 1 5 57 41 8.6493 21.2085
## 6 1 6 49 33 9.5238 16.6667
str(oranges)
## 'data.frame': 36 obs. of 6 variables:
## $ store : Factor w/ 6 levels "1","2","3","4",..: 1 1 1 1 1 1 2 2 2 2 ...
## $ day : Factor w/ 6 levels "1","2","3","4",..: 1 2 3 4 5 6 1 2 3 4 ...
## $ price1: int 37 37 45 41 57 49 49 53 53 53 ...
## $ price2: int 61 37 53 41 41 33 49 53 45 53 ...
## $ sales1: num 11.32 12.92 18.89 14.67 8.65 ...
## $ sales2: num 0.0047 0.0037 7.5429 7.0652 21.2085 ...
接下来,我们看一下影响品种1销量(Sales1)的主要因素。
oranges.lm1 <- lm(sales1 ~ price1 + price2 + store + day , data = oranges)
anova(oranges.lm1)
## Analysis of Variance Table
##
## Response: sales1
## Df Sum Sq Mean Sq F value Pr(>F)
## price1 1 516.59 516.59 29.0996 1.763e-05 ***
## price2 1 62.73 62.73 3.5334 0.072873 .
## store 5 212.95 42.59 2.3991 0.068548 .
## day 5 433.10 86.62 4.8793 0.003456 **
## Residuals 23 408.31 17.75
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
其中lm()
函数为R自带的函数,包括在stats包中。anova()
给出了方差分析表。
从表中可以看出,price2对sales1的影响未达到显著水平。
lsmeans包中的ref.grid()
可以用来建立参考表格,代码如下:
oranges.rg1 <- ref.grid(oranges.lm1)
oranges.rg1
## 'ref.grid' object with variables:
## price1 = 51.222
## price2 = 48.556
## store = 1, 2, 3, 4, 5, 6
## day = 1, 2, 3, 4, 5, 6
从上边结果中可以看出,两个价格协变量price1和price2用它们的均值作为参考水平。
两个因子用它们各自的6个水平作为参考水平。因此参考表格共计包括\(1×1×6×6=36\)个水平组合。
LS means基于参考表格的预测值进行计算。参考表格的预测值可以通过summary()
或者predict()
获得。
oranges.rg1.prediction <- summary(oranges.rg1)
oranges.rg1.prediction
## price1 price2 store day prediction SE df
## 51.22222 48.55556 1 1 2.918413 2.717559 23
## 51.22222 48.55556 2 1 4.961475 2.377742 23
## 51.22222 48.55556 3 1 3.200891 2.377742 23
## 51.22222 48.55556 4 1 6.198757 2.363673 23
## 51.22222 48.55556 5 1 5.543218 2.363116 23
## 51.22222 48.55556 6 1 10.563739 2.366683 23
## 51.22222 48.55556 1 2 3.848804 2.701335 23
## 51.22222 48.55556 2 2 5.891866 2.335579 23
## 51.22222 48.55556 3 2 4.131282 2.335579 23
## 51.22222 48.55556 4 2 7.129148 2.352186 23
## 51.22222 48.55556 5 2 6.473609 2.330670 23
## 51.22222 48.55556 6 2 11.494130 2.339254 23
## 51.22222 48.55556 1 3 11.018569 2.534556 23
## 51.22222 48.55556 2 3 13.061630 2.416451 23
## 51.22222 48.55556 3 3 11.301047 2.416451 23
## 51.22222 48.55556 4 3 14.298913 2.431679 23
## 51.22222 48.55556 5 3 13.643374 2.363673 23
## 51.22222 48.55556 6 3 18.663895 2.347839 23
## 51.22222 48.55556 1 4 6.096286 2.651370 23
## 51.22222 48.55556 2 4 8.139348 2.352186 23
## 51.22222 48.55556 3 4 6.378765 2.352186 23
## 51.22222 48.55556 4 4 9.376630 2.388653 23
## 51.22222 48.55556 5 4 8.721091 2.337599 23
## 51.22222 48.55556 6 4 13.741613 2.341304 23
## 51.22222 48.55556 1 5 12.795800 2.444597 23
## 51.22222 48.55556 2 5 14.838862 2.466155 23
## 51.22222 48.55556 3 5 13.078278 2.466155 23
## 51.22222 48.55556 4 5 16.076144 2.519089 23
## 51.22222 48.55556 5 5 15.420605 2.395544 23
## 51.22222 48.55556 6 5 20.441126 2.370343 23
## 51.22222 48.55556 1 6 8.748779 2.786176 23
## 51.22222 48.55556 2 6 10.791841 2.337599 23
## 51.22222 48.55556 3 6 9.031258 2.337599 23
## 51.22222 48.55556 4 6 12.029123 2.364688 23
## 51.22222 48.55556 5 6 11.373584 2.352318 23
## 51.22222 48.55556 6 6 16.394106 2.370539 23
有了基于参考表格的预测值后,通过lsmeans包中的lsmeans()
函数可以直接获得某一个因子各水平的最小二乘均值。
lsmeans(oranges.rg1,"day")
## day lsmean SE df lower.CL upper.CL
## 1 5.564415 1.768083 23 1.906856 9.221974
## 2 6.494807 1.728959 23 2.918183 10.071430
## 3 13.664571 1.751505 23 10.041308 17.287835
## 4 8.742289 1.733920 23 5.155403 12.329175
## 5 15.441803 1.785809 23 11.747576 19.136029
## 6 11.394782 1.766726 23 7.740031 15.049533
##
## Results are averaged over the levels of: store
## Confidence level used: 0.95
上面结果,给出了每一天品种1销量(Sales1)的最小二乘均值。
可以从看出,周三、周五的销量最高,周六次之。
上述结果实际上是对oranges.rg1.prediction预测数据集,按照day不同水平进行均值汇总输出的结果。
利用dplyr包相关函数手动计算day六个水平的最小二乘均值:
suppressMessages(require(dplyr))
oranges.rg1.day.lsmeans <- oranges.rg1.prediction %>%
group_by(day) %>%
summarise(lsmean=mean(prediction))
oranges.rg1.day.lsmeans
## # A tibble: 6 x 2
## day lsmean
## <fctr> <dbl>
## 1 1 5.564415
## 2 2 6.494807
## 3 3 13.664571
## 4 4 8.742289
## 5 5 15.441803
## 6 6 11.394782
进一步利用ASReml-R中的相关函数,计算day六个水平的最小二乘均值,相互验证。
suppressMessages(require(asreml))
## Licensed to: YSFRI
## Serial Number: 402568354 Expires: 30-nov-2017 (168 days)
oranges.asreml <-
asreml(
fixed = sales1 ~ -1+store + day + price1 + price2 ,
rcov = ~ units,
family = asreml.gaussian(),
data = oranges,
maxiter = 100
)
## ASReml: Thu Jun 15 21:52:31 2017
##
## LogLik S2 DF wall cpu
## -60.6324 17.7525 23 21:52:31 0.0
## -60.6324 17.7525 23 21:52:31 0.0
##
## Finished on: Thu Jun 15 21:52:31 2017
##
## LogLikelihood Converged
oranges.asreml.lsmeans <- predict(oranges.asreml,classify = "day")
## ASReml: Thu Jun 15 21:52:35 2017
##
## LogLik S2 DF wall cpu
## -60.6324 17.7525 23 21:52:35 0.0
## -60.6324 17.7525 23 21:52:35 0.0
##
## Finished on: Thu Jun 15 21:52:35 2017
##
## LogLikelihood Converged
oranges.asreml.lsmeans$predictions$pvals
##
## Notes:
## - The predictions are obtained by averaging across the hypertable
## calculated from model terms constructed solely from factors in
## the averaging and classify sets.
## - Use "average" to move ignored factors into the averaging set.
##
## - price1 evaluated at average value of 51.222222
## - price2 evaluated at average value of 48.555556
## - The SIMPLE averaging set: store
##
## day predicted.value standard.error est.status
## 1 1 5.564415 1.768083 Estimable
## 2 2 6.494807 1.728959 Estimable
## 3 3 13.664571 1.751505 Estimable
## 4 4 8.742289 1.733920 Estimable
## 5 5 15.441803 1.785808 Estimable
## 6 6 11.394782 1.766726 Estimable
从结果来看,利用lsmeans,手动汇总和asreml给出的结果都是一致的。上述结果有助于加深对于LS means的理解。
在计算LS means的时候,可以考虑设置协变量的值,以及要预测的因子的水平。
譬如,我们可以设置在price1=50,price2=40以及price2=60水平上,只预测2,3,4三天的最小二乘均值。
org.lsm <- lsmeans(oranges.lm1,"day",by="price2",at=list(price1=50,price2=c(40,60),day=c("2","3","4")))
org.lsm
## price2 = 40:
## day lsmean SE df lower.CL upper.CL
## 2 6.236227 1.887106 23 2.332452 10.14000
## 3 13.405992 2.119376 23 9.021730 17.79025
## 4 8.483710 1.866510 23 4.622540 12.34488
##
## price2 = 60:
## day lsmean SE df lower.CL upper.CL
## 2 9.213169 2.109448 23 4.849443 13.57689
## 3 16.382933 1.905216 23 12.441693 20.32417
## 4 11.460651 2.178054 23 6.955003 15.96630
##
## Results are averaged over the levels of: store
## Confidence level used: 0.95