In these lab exercises you will explore some different workflows that you can use to implement an MI-based analysis.
Unless otherwise specified, you will be analyzing the boys dataset from the mice package in all following exercises.
Load the mice
, miceadds
, and mitools
packages.
Multiply impute the boys data using passive imputation for bmi
.
Use passive imputation to maintain the known relation between bmi
, wgt
, and
hgt
.
bmi
as "~ I(wgt / (hgt / 100)^2)"
.imp1
.Create trace plots, density plots, and strip plots from the mids object you created in Question 2.1.
What do you conclude vis-a-vis convergence and the validity of these imputations?
First, we will continue to explore the workflow we’ve already been using wherein
we fit the analysis models using with.mids()
. This workflow applies under two
conditions:
coef()
and vcov()
methods.Use the imputed data you created in Question 2.1 to fit the following regression model.
\(Y_{bmi} = \beta_0 + \beta_1 X_{age} + \beta_2 X_{region=east} + \beta_3 X_{region=west} + \beta_4 X_{region=south} + \beta_5 X_{region = city} + \varepsilon\)
Pool the MI estimates.
You may have noticed that the output above only includes information about the
coefficients and their significance tests but no model fit information. The \(R^2\)
statistic is not normally distributed, so we should use a slightly more complex
pooling method for the \(R^2\). More information on the specifics of pooling the
\(R^2\) is available in this section
of FIMD and in Harel (2009). The
correct pooling rule is implemented by the mice::pool.r.squared()
function.
Check the documentation for the mice::pool.r.squared()
function.
Use pool.r.squared()
to pool the \(R^2\) and the adjusted \(R^2\) for the model
you estimated in Question 3.1.
We also need special techniques to pool the \(m\) estimated \(F\) statistics in an MI-based analysis. The technical details of these pooling rules are too complex to detail here. The general method was outlined by Rubin (1987) and extended by Li, Raghunathan, and Rubin (1991), Li, Meng, Raghunathan, and Rubin (1991), and Meng and Rubin (1992).
mice::D1()
.mice::D2()
.mice::D3()
.Use the D1()
, D2()
, and D3()
functions to pool the \(F\) for the model you
estimated in Question 3.1.
You should notice some differences between the three pooled statistics. Each statistic uses a different pooling formula based on different assumptions.
When the appropriate estimates are available, the \(D1\) statistic is usually a good choice. A more detailed discussion and comparison of these three statistics is available in this section of FIMD.
We can also use these functions (as well as the anova()
function) to do
significance testing for model comparisons using MI data.
Use the imputed data from Question 2.1 to estimate a restricted model
wherein bmi
is predicted by only age
.
Use the D1()
, D2()
, D3()
, and anova()
functions to compare this
restricted model with the full model you estimated in Question 3.1.
anova()
?Sometimes (rather often, actually), we need to process the imputed data before we can fit an analysis model. In such cases, we usually implement something like the following workflow.
mice()
mice::complete()
to create a list of multiply imputed datasetsUse mice::complete()
to create a list of imputed datasets from the mids
object you created in Question 2.1.
Name the resulting list impData
.
Center age
on 18 in each of the imputed datasets you created in Question
4.1.
TIP: You can use lapply()
to broadcast the data transformation across all
elements in impData
.
Use lapply()
to fit the model from Question 3.1 to each of the
transformed datasets produced in Question 4.2.
At this point, you should have a standard R list containing the 10 fitted lm objects. You have a few options for pooling these results.
mitools::MIcombine()
function, and directly submit the list of model fits from
Question 4.3 as input to the function.mice::as.mira()
function to first cast the
list from Question 4.3 as a mira object. You can then pool the
results using all the methods you’ve already learned.Check the documentation for mitools::MIcombine()
and mice::as.mira()
.
Pool the fitted models from Question 4.3 using mitools::MIcombine()
.
Pool the fitted models from Question 4.3 using mice::as.mira()
and mice::pool()
.
What do you notice vis-a-vis the FMI/\(\lambda\) produced by these two pooling approaches?
If we want to pool parameters from a modeling function that does not provide
coef()
and vcov()
functions, we cannot use mice::pool()
or
mitools::MIcombine()
to do so.
Fortunately, as long as we can estimate the parameters of interest and their
standard errors from each imputed dataset, we can still pool the results. We can
use the mice::pool.scalar()
function to do so.
Check the documentation for mice::pool.scalar()
.
The t.test()
function is one popular function for which we cannot use the
standard pooling workflow. The following code shows one possible workflow for
conducting a t-test using multiply imputed data.
We’ll conduct an independent samples t-test for the average testicular volume of boys who are younger than 13 and boys who are 13 or older.
library(magrittr)
## Run the t-test on each imputed dataset:
tests <- lapply(impData, function(x) t.test(tv ~ I(age < 13), data = x))
## Extract the estimated parameters (i.e., mean differences):
d <- sapply(tests, function(x) diff(x$estimate) %>% abs())
## Extract the standard errors:
se <- sapply(tests, "[[", x = "stderr")
## Pool the estimates:
pooled <- pool.scalar(Q = d, U = se^2, n = nrow(impData[[1]]))
## View the pooled parameter estimate:
pooled$qbar
## [1] 14.15356
## Compute the t-statistic using the pooled estimates:
(t <- pooled %$% (qbar / sqrt(t)))
## [1] 28.57915
## Compute the two-tailed p-value:
2 * pt(t, df = pooled$df, lower.tail = FALSE)
## [1] 7.117193e-30
Conduct the same t-test as above using listwise deletion.
Compare the MI-based results to the deletion-based results.
If we want to do an ANOVA with MI data, the pooling techniques we’ve discussed so far can be a bit of a pain. We can easily estimate and pool the underlying linear model (since that’s just a linear regression model), but getting a pooled version of the standard ANOVA table would require quite a lot of work.
Thankfully, the miceadds::mi.anova()
function does all of the heavy lifting
for us.
Check the documentation for the miceadds::mi.anova()
function.
Use mi.anova()
to estimate a factorial ANOVA wherein bmi
is the DV and
reg
and gen
are the IVs.
Use the mids object you created in Question 2.1.
End of Lab 2c