lsmeans

oranges的数据集结构如下：

两个品种，对应两个价格（price1和price2）和两个销量（sales1和sales2）；
然后还有store和day两个列变量，分别表示售卖的商店和时间；
store和day两个变量为因子类型，价格为整数类型，销量为数字类型。

dim(oranges)

## [1] 36  6

head(oranges)

##   store day price1 price2  sales1  sales2
## 1     1   1     37     61 11.3208  0.0047
## 2     1   2     37     37 12.9151  0.0037
## 3     1   3     45     53 18.8947  7.5429
## 4     1   4     41     41 14.6739  7.0652
## 5     1   5     57     41  8.6493 21.2085
## 6     1   6     49     33  9.5238 16.6667

str(oranges)

## 'data.frame':    36 obs. of  6 variables:
##  $ store : Factor w/ 6 levels "1","2","3","4",..: 1 1 1 1 1 1 2 2 2 2 ...
##  $ day   : Factor w/ 6 levels "1","2","3","4",..: 1 2 3 4 5 6 1 2 3 4 ...
##  $ price1: int  37 37 45 41 57 49 49 53 53 53 ...
##  $ price2: int  61 37 53 41 41 33 49 53 45 53 ...
##  $ sales1: num  11.32 12.92 18.89 14.67 8.65 ...
##  $ sales2: num  0.0047 0.0037 7.5429 7.0652 21.2085 ...

接下来，我们看一下影响品种1销量（Sales1）的主要因素。

在模型中包括store,day, price1和price2等4个因素。
其中store和day由于是因子变量，因此作为固定效应；price1和price2作为协变量。
这里把price2纳入模型，是因为品种2的价格也会影响品种1的销量。

oranges.lm1 <- lm(sales1 ~ price1 + price2 + store + day , data = oranges)
anova(oranges.lm1)

## Analysis of Variance Table
## 
## Response: sales1
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## price1     1 516.59  516.59 29.0996 1.763e-05 ***
## price2     1  62.73   62.73  3.5334  0.072873 .  
## store      5 212.95   42.59  2.3991  0.068548 .  
## day        5 433.10   86.62  4.8793  0.003456 ** 
## Residuals 23 408.31   17.75                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

其中lm()函数为R自带的函数，包括在stats包中。anova()给出了方差分析表。

从表中可以看出，price2对sales1的影响未达到显著水平。

lsmeans包中的ref.grid()可以用来建立参考表格，代码如下：

oranges.rg1 <- ref.grid(oranges.lm1)
oranges.rg1

## 'ref.grid' object with variables:
##     price1 = 51.222
##     price2 = 48.556
##     store = 1, 2, 3, 4, 5, 6
##     day = 1, 2, 3, 4, 5, 6

从上边结果中可以看出，两个价格协变量price1和price2用它们的均值作为参考水平。

两个因子用它们各自的6个水平作为参考水平。因此参考表格共计包括\(1×1×6×6=36\)个水平组合。

LS means基于参考表格的预测值进行计算。参考表格的预测值可以通过summary()或者predict()获得。

oranges.rg1.prediction <- summary(oranges.rg1)
oranges.rg1.prediction

##    price1   price2 store day prediction       SE df
##  51.22222 48.55556 1     1     2.918413 2.717559 23
##  51.22222 48.55556 2     1     4.961475 2.377742 23
##  51.22222 48.55556 3     1     3.200891 2.377742 23
##  51.22222 48.55556 4     1     6.198757 2.363673 23
##  51.22222 48.55556 5     1     5.543218 2.363116 23
##  51.22222 48.55556 6     1    10.563739 2.366683 23
##  51.22222 48.55556 1     2     3.848804 2.701335 23
##  51.22222 48.55556 2     2     5.891866 2.335579 23
##  51.22222 48.55556 3     2     4.131282 2.335579 23
##  51.22222 48.55556 4     2     7.129148 2.352186 23
##  51.22222 48.55556 5     2     6.473609 2.330670 23
##  51.22222 48.55556 6     2    11.494130 2.339254 23
##  51.22222 48.55556 1     3    11.018569 2.534556 23
##  51.22222 48.55556 2     3    13.061630 2.416451 23
##  51.22222 48.55556 3     3    11.301047 2.416451 23
##  51.22222 48.55556 4     3    14.298913 2.431679 23
##  51.22222 48.55556 5     3    13.643374 2.363673 23
##  51.22222 48.55556 6     3    18.663895 2.347839 23
##  51.22222 48.55556 1     4     6.096286 2.651370 23
##  51.22222 48.55556 2     4     8.139348 2.352186 23
##  51.22222 48.55556 3     4     6.378765 2.352186 23
##  51.22222 48.55556 4     4     9.376630 2.388653 23
##  51.22222 48.55556 5     4     8.721091 2.337599 23
##  51.22222 48.55556 6     4    13.741613 2.341304 23
##  51.22222 48.55556 1     5    12.795800 2.444597 23
##  51.22222 48.55556 2     5    14.838862 2.466155 23
##  51.22222 48.55556 3     5    13.078278 2.466155 23
##  51.22222 48.55556 4     5    16.076144 2.519089 23
##  51.22222 48.55556 5     5    15.420605 2.395544 23
##  51.22222 48.55556 6     5    20.441126 2.370343 23
##  51.22222 48.55556 1     6     8.748779 2.786176 23
##  51.22222 48.55556 2     6    10.791841 2.337599 23
##  51.22222 48.55556 3     6     9.031258 2.337599 23
##  51.22222 48.55556 4     6    12.029123 2.364688 23
##  51.22222 48.55556 5     6    11.373584 2.352318 23
##  51.22222 48.55556 6     6    16.394106 2.370539 23

有了基于参考表格的预测值后，通过lsmeans包中的lsmeans()函数可以直接获得某一个因子各水平的最小二乘均值。

lsmeans(oranges.rg1,"day")

##  day    lsmean       SE df  lower.CL  upper.CL
##  1    5.564415 1.768083 23  1.906856  9.221974
##  2    6.494807 1.728959 23  2.918183 10.071430
##  3   13.664571 1.751505 23 10.041308 17.287835
##  4    8.742289 1.733920 23  5.155403 12.329175
##  5   15.441803 1.785809 23 11.747576 19.136029
##  6   11.394782 1.766726 23  7.740031 15.049533
## 
## Results are averaged over the levels of: store 
## Confidence level used: 0.95

上面结果，给出了每一天品种1销量(Sales1)的最小二乘均值。

可以从看出，周三、周五的销量最高，周六次之。

上述结果实际上是对oranges.rg1.prediction预测数据集，按照day不同水平进行均值汇总输出的结果。

利用dplyr包相关函数手动计算day六个水平的最小二乘均值：

suppressMessages(require(dplyr))
oranges.rg1.day.lsmeans <- oranges.rg1.prediction %>% 
  group_by(day) %>% 
  summarise(lsmean=mean(prediction))
oranges.rg1.day.lsmeans

## # A tibble: 6 x 2
##      day    lsmean
##   <fctr>     <dbl>
## 1      1  5.564415
## 2      2  6.494807
## 3      3 13.664571
## 4      4  8.742289
## 5      5 15.441803
## 6      6 11.394782

进一步利用ASReml-R中的相关函数，计算day六个水平的最小二乘均值，相互验证。

suppressMessages(require(asreml))

## Licensed to: YSFRI
## Serial Number: 402568354 Expires: 30-nov-2017 (168 days)

oranges.asreml <-
  asreml(
    fixed = sales1 ~ -1+store + day + price1 + price2 ,
    rcov = ~ units,
    family = asreml.gaussian(),
    data = oranges,
    maxiter = 100
  )

## ASReml: Thu Jun 15 21:52:31 2017
## 
##      LogLik         S2      DF      wall     cpu
##     -60.6324     17.7525    23  21:52:31     0.0
##     -60.6324     17.7525    23  21:52:31     0.0
## 
## Finished on: Thu Jun 15 21:52:31 2017
##  
## LogLikelihood Converged

oranges.asreml.lsmeans <- predict(oranges.asreml,classify = "day")

## ASReml: Thu Jun 15 21:52:35 2017
## 
##      LogLik         S2      DF      wall     cpu
##     -60.6324     17.7525    23  21:52:35     0.0
##     -60.6324     17.7525    23  21:52:35     0.0
## 
## Finished on: Thu Jun 15 21:52:35 2017
##  
## LogLikelihood Converged

oranges.asreml.lsmeans$predictions$pvals

## 
## Notes:
## - The predictions are obtained by averaging across the hypertable
##   calculated from model terms constructed solely from factors in
##   the averaging and classify sets.
## - Use "average" to move ignored factors into the averaging set.
## 
## - price1 evaluated at average value of 51.222222
## - price2 evaluated at average value of 48.555556
## - The SIMPLE averaging set:  store 
## 
##   day predicted.value standard.error est.status
## 1   1        5.564415       1.768083  Estimable
## 2   2        6.494807       1.728959  Estimable
## 3   3       13.664571       1.751505  Estimable
## 4   4        8.742289       1.733920  Estimable
## 5   5       15.441803       1.785808  Estimable
## 6   6       11.394782       1.766726  Estimable

从结果来看，利用lsmeans，手动汇总和asreml给出的结果都是一致的。上述结果有助于加深对于LS means的理解。

1.2.3 调整参考表格

在计算LS means的时候，可以考虑设置协变量的值，以及要预测的因子的水平。
譬如，我们可以设置在price1=50，price2=40以及price2=60水平上，只预测2,3,4三天的最小二乘均值。

org.lsm <- lsmeans(oranges.lm1,"day",by="price2",at=list(price1=50,price2=c(40,60),day=c("2","3","4")))
org.lsm

## price2 = 40:
##  day    lsmean       SE df  lower.CL upper.CL
##  2    6.236227 1.887106 23  2.332452 10.14000
##  3   13.405992 2.119376 23  9.021730 17.79025
##  4    8.483710 1.866510 23  4.622540 12.34488
## 
## price2 = 60:
##  day    lsmean       SE df  lower.CL upper.CL
##  2    9.213169 2.109448 23  4.849443 13.57689
##  3   16.382933 1.905216 23 12.441693 20.32417
##  4   11.460651 2.178054 23  6.955003 15.96630
## 
## Results are averaged over the levels of: store 
## Confidence level used: 0.95

lsmeans

Sheng Luan

2017年6月14日

1.2.3 调整参考表格