In these lab exercises, you will explore how certain pathological data characteristics can affect imputation results. You will also learn a very useful technique for imputing variables with known relations: Passive Imputation.
Load the mice
package.
library(mice)
The mammalsleep dataset is part of mice. This dataset contains the Allison and Cicchetti (1976) data for mammalian species.
Check the documentation for the mammalsleep dataset.
?mammalsleep
Summarize the data to get an overview of their structure.
head(mammalsleep)
## species bw brw sws ps ts mls gt pi sei odi
## 1 African elephant 6654.000 5712.0 NA NA 3.3 38.6 645 3 5 3
## 2 African giant pouched rat 1.000 6.6 6.3 2.0 8.3 4.5 42 3 1 3
## 3 Arctic Fox 3.385 44.5 NA NA 12.5 14.0 60 1 1 1
## 4 Arctic ground squirrel 0.920 5.7 NA NA 16.5 NA 25 5 2 3
## 5 Asian elephant 2547.000 4603.0 2.1 1.8 3.9 69.0 624 3 5 4
## 6 Baboon 10.550 179.5 9.1 0.7 9.8 27.0 180 4 4 4
summary(mammalsleep)
## species bw brw
## African elephant : 1 Min. : 0.005 Min. : 0.14
## African giant pouched rat: 1 1st Qu.: 0.600 1st Qu.: 4.25
## Arctic Fox : 1 Median : 3.342 Median : 17.25
## Arctic ground squirrel : 1 Mean : 198.790 Mean : 283.13
## Asian elephant : 1 3rd Qu.: 48.202 3rd Qu.: 166.00
## Baboon : 1 Max. :6654.000 Max. :5712.00
## (Other) :56
## sws ps ts mls
## Min. : 2.100 Min. :0.000 Min. : 2.60 Min. : 2.000
## 1st Qu.: 6.250 1st Qu.:0.900 1st Qu.: 8.05 1st Qu.: 6.625
## Median : 8.350 Median :1.800 Median :10.45 Median : 15.100
## Mean : 8.673 Mean :1.972 Mean :10.53 Mean : 19.878
## 3rd Qu.:11.000 3rd Qu.:2.550 3rd Qu.:13.20 3rd Qu.: 27.750
## Max. :17.900 Max. :6.600 Max. :19.90 Max. :100.000
## NA's :14 NA's :12 NA's :4 NA's :4
## gt pi sei odi
## Min. : 12.00 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.: 35.75 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:1.000
## Median : 79.00 Median :3.000 Median :2.000 Median :2.000
## Mean :142.35 Mean :2.871 Mean :2.419 Mean :2.613
## 3rd Qu.:207.50 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :645.00 Max. :5.000 Max. :5.000 Max. :5.000
## NA's :4
str(mammalsleep)
## 'data.frame': 62 obs. of 11 variables:
## $ species: Factor w/ 62 levels "African elephant",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ bw : num 6654 1 3.38 0.92 2547 ...
## $ brw : num 5712 6.6 44.5 5.7 4603 ...
## $ sws : num NA 6.3 NA NA 2.1 9.1 15.8 5.2 10.9 8.3 ...
## $ ps : num NA 2 NA NA 1.8 0.7 3.9 1 3.6 1.4 ...
## $ ts : num 3.3 8.3 12.5 16.5 3.9 9.8 19.7 6.2 14.5 9.7 ...
## $ mls : num 38.6 4.5 14 NA 69 27 19 30.4 28 50 ...
## $ gt : num 645 42 60 25 624 180 35 392 63 230 ...
## $ pi : int 3 3 1 5 3 4 1 4 1 1 ...
## $ sei : int 5 1 1 2 5 4 1 5 2 1 ...
## $ odi : int 3 3 1 3 4 4 1 4 1 1 ...
Use mice()
to multiply impute the mammalsleep data.
imp <- mice(mammalsleep, maxit = 10, seed = 235711, print = FALSE)
When you run this imputation, you will probably see a warning about “logged
events”. When mice()
encounters certain computational difficulties (e.g.,
extreme collinearity), it will take automatic remedial action and log the action
in the loggedEvents
slot of the mids object.
If you ever see a warning about logged events, you should check the loggedEvents
slot to see what actions were taken. You want to assess if the actions were
appropriate and judge the likely impact that the actions will have on your
results. If the actions mice()
has taken seem too extreme, you need to address
the underlying data issues and rerun the imputation with the cleaned data.
Check the contents of the loggedEvents
slot in the mids object you created
in Question 2.1.
mice()
logging here?imp$loggedEvents %>% head()
## it im dep meth
## 1 1 1 sws pmm
## 2 1 1 sws pmm
## 3 1 1 sws pmm
## 4 1 1 ps pmm
## 5 1 1 ps pmm
## 6 1 1 ps pmm
## out
## 1 df set to 1. # observed cases: 48 # predictors: 71
## 2 speciesArctic Fox, speciesArctic ground squirrel, speciesAsian elephant, speciesBaboon, speciesDonkey, speciesGiraffe, speciesGorilla, speciesGray wolf, speciesJaguar, speciesKangaroo, speciesOkapi, speciesRaccoon, speciesRoe deer, speciesSlow loris, speciesYellow-bellied marmot, brw, ts, gt, odi
## 3 mice detected that your data are (nearly) multi-collinear.\nIt applied a ridge penalty to continue calculations, but the results can be unstable.\nDoes your dataset contain duplicates, linear transformation, or factors with unique respondent names?
## 4 df set to 1. # observed cases: 50 # predictors: 71
## 5 speciesArctic Fox, speciesArctic ground squirrel, speciesDonkey, speciesGorilla, speciesGray wolf, speciesJaguar, speciesKangaroo, speciesRaccoon, speciesRoe deer, speciesSlow loris, speciesYellow-bellied marmot, bw, sws, ts, gt, sei, odi
## 6 mice detected that your data are (nearly) multi-collinear.\nIt applied a ridge penalty to continue calculations, but the results can be unstable.\nDoes your dataset contain duplicates, linear transformation, or factors with unique respondent names?
The imputation models seem to have more predictors (\(P = 71\)) than observations
(\(N \approx 50\)). It looks like something may have gone “wrong” with the species
variable. To get around the issue, mice()
has applied a ridge penalty.
We can get a sense of how impactful the issues noted in the loggedEvents
log
have been through our usual diagnostic plots.
Create trace plots, density plots, and strip plots from the mids object you created in Question 2.1.
What do you conclude vis-a-vis convergence and the validity of these imputations?
plot(imp)
densityplot(imp)
stripplot(imp)
The trace plots look OK. So, it appears that the imputation model have converged onto some equilibrium. The density plots and the strip plots, however, suggest very poor imputations. The density plots, in particular, clearly show far too little variability in the imputed values. Nearly all of the imputations collapse onto a small range of values (hence the “sharp” spikes in the density plots). Although the imputation models may have converged, it appears they have converged onto the wrong solution. These imputations are certainly not reasonable.
For didactic purposes, let’s “play the fool”, ignore any information we may have gleaned from the diagnostic plots, and analyze these imputed data via the usual process.
Use the imputed data you created in Question 2.1 to fit the following regression model.
\(Y_{sws} = \beta_0 + \beta_1 X_{ln(bw)} + \beta_2 X_{odi} + \varepsilon\)
Pool the MI estimates, and check the RIV, \(\lambda\), and FMI.
est <- with(imp, lm(sws ~ log(bw) + odi)) %>% pool()
summary(est)
## term estimate std.error statistic df p.value
## 1 (Intercept) 9.9944925 1.3428477 7.442759 7.795874 8.360758e-05
## 2 log(bw) -0.6146995 0.2856365 -2.152034 4.650517 8.815594e-02
## 3 odi -0.5202660 0.3426943 -1.518163 30.636875 1.392260e-01
est
## Class: mipo m = 5
## term m estimate ubar b t dfcom df
## 1 (Intercept) 5 9.9944925 0.74599710 0.88103582 1.80324008 59 7.795874
## 2 log(bw) 5 -0.6146995 0.01986307 0.05143763 0.08158823 59 4.650517
## 3 odi 5 -0.5202660 0.09327086 0.02014045 0.11743941 59 30.636875
## riv lambda fmi
## 1 1.4172213 0.5863018 0.6629419
## 2 3.1075326 0.7565448 0.8201889
## 3 0.2591221 0.2057959 0.2530181
Although the estimates seem sensible, the RIV, \(\lambda\), and FMI values all suggest that the missing data have had a very large influence on the results. For example, the \(\lambda\) for \(\hat{\beta}_1\) tells us that 80% of the sampling variance in \(\hat{\beta}_1\) is attributable to the missing data and our treatment thereof. Although we already knew these imputations were suspect, these regression results further confirm the poor quality of the imputations.
The poor performance you should have noted above is largely driven by the
species
variable. This variable is a factor with 62
levels. So, when we include this variable as a predictor in the imputation models,
it enters the model as a set of 61 dummy codes.
These dummy codes produce the \(P > N\) problem noted in the loggedEvents
log
which leads to poor imputations.
Use mice()
to re-impute the mammalsleep data.
Use the same settings from Question 2.1, but do not use species
as a
predictor in any of the imputation models.
imp1
.pred <- imp$predictorMatrix
pred[ , "species"] <- 0
imp1 <- mice(mammalsleep,
predictorMatrix = pred,
maxit = 10,
seed = 235711,
print = FALSE)
If you get a warning about logged events. Check the loggedEvents
slot of the
mids object you created in Question 2.5.
imp1$loggedEvents
## it im dep meth out
## 1 1 2 mls pmm ts
## 2 2 2 mls pmm ts
## 3 2 2 gt pmm ts
## 4 3 5 mls pmm ts
## 5 4 1 mls pmm ts
## 6 4 3 mls pmm ts
## 7 4 3 gt pmm ts
## 8 4 4 mls pmm ts
## 9 4 4 gt pmm ts
## 10 4 5 mls pmm ts
## 11 4 5 gt pmm ts
## 12 5 1 mls pmm ts
## 13 5 4 mls pmm ts
## 14 5 4 gt pmm ts
## 15 5 5 mls pmm ts
## 16 5 5 gt pmm ts
## 17 7 3 mls pmm ts
## 18 7 3 gt pmm ts
## 19 7 5 mls pmm ts
## 20 7 5 gt pmm ts
## 21 8 2 mls pmm ts
## 22 8 3 mls pmm ts
## 23 9 2 mls pmm ts
## 24 10 5 mls pmm ts
This time, the logged events are telling us about collinearity problems and the
actions taken to remedy the collinearity. Specifically, when imputing mls
and
gt
, ts
was collinear with other predictors, so it was removed from the model.
Create trace plots, density plots, and strip plots from the mids object you created in Question 2.5.
What do you conclude vis-a-vis convergence and the validity of these imputations?
plot(imp1)
densityplot(imp1)
stripplot(imp1)
This time, the trace plots suggest some serious identification issues. Notice
how the individual lines in the plots of the means for ps
and ts
are stable
but do not mix. This pattern is indicative of an under-identified model.
Basically, the data do not contain enough information to define a unique
solution for some parameters.
The imputations look pretty much fine, but we don’t care. The imputation model must converge before we can move on to considering the plausibility of the imputed values.
The convergence issues you should have noticed above are caused by structural
features of the data. Total sleep (ts
) is the sum of paradoxical sleep (ps
)
and short wave sleep (sws
). The imputation model treats ts
as distinct and
stochastically related to ps
and sws
, but ts
is actually a deterministic
function of ps
and sws
. This deterministic relation is ignored in the
imputations, and the resulting circularity in the imputations keeps the model
from finding a unique solution.
Thankfully, mice()
offers a convenient routine for addressing exactly these
types of known relations among variables in the imputation model: passive
imputation. With passive imputation, we can account for transformations,
combinations, and recoded variables when imputing their missing data.
Frequently, we need to transform, combine, or recode variables. When such a need arises with incomplete variables that we’d like to impute, we have a few options.
Both of these approaches have an important limitation, though. In neither case does the imputation model have access to both the original and the transformed versions of the variable in question. The imputations are either generated using the information in the raw version of the variable (in the impute-then-transform approach) or using the information in the transformation (in the just-another-variable approach), but not both. Note that keeping both the raw and transformed versions of a variable in the model is not an option since doing so induces perfectly collinear variables.
To solve this problem, mice()
implements a third approach called passive
imputation. The goal of passive imputation is to maintain known, deterministic
relations among incomplete variables throughout the imputation process and to
allow the imputation model to use the transformed variables as predictors when
imputing other variables (other than the raw version of the transformed variables,
themselves).
For example, we can use passive imputation to maintain the following
deterministic function in the boys
data
\[\text{BMI} = \frac{\text{Weight}}{\text{Height}^2}\]
or this compositional relation in the mammalsleep
data
\[\text{ts} = \text{ps}+\text{sws}.\]
To implement passive imputation, we need to adjust two features of the mice()
setup:
The following code will adjust the method vector from Question 2.5 to
implement passive imputation for ts
.
(meth <- imp1$method)
## species bw brw sws ps ts mls gt pi sei
## "" "" "" "pmm" "pmm" "pmm" "pmm" "pmm" "" ""
## odi
## ""
meth["ts"]<- "~ I(sws + ps)"
meth
## species bw brw sws ps
## "" "" "" "pmm" "pmm"
## ts mls gt pi sei
## "~ I(sws + ps)" "pmm" "pmm" "" ""
## odi
## ""
Now, ts
will not be independently imputed along with the other variables.
Rather, in each iteration, the most recently completed version of sws
and ps
will be added together to define the updated version of ts
.
The updated version of ts
defined according to the deterministic relation
described above can then be used as a predictor when imputing other variables,
but we do not want to use ts
to impute either sws
or ps
(to avoid
circularity). So, we need to adjust the predictor matrix to satisfy this
restriction.
(pred <- imp1$predictorMatrix)
## species bw brw sws ps ts mls gt pi sei odi
## species 0 1 1 1 1 1 1 1 1 1 1
## bw 0 0 1 1 1 1 1 1 1 1 1
## brw 0 1 0 1 1 1 1 1 1 1 1
## sws 0 1 1 0 1 1 1 1 1 1 1
## ps 0 1 1 1 0 1 1 1 1 1 1
## ts 0 1 1 1 1 0 1 1 1 1 1
## mls 0 1 1 1 1 1 0 1 1 1 1
## gt 0 1 1 1 1 1 1 0 1 1 1
## pi 0 1 1 1 1 1 1 1 0 1 1
## sei 0 1 1 1 1 1 1 1 1 0 1
## odi 0 1 1 1 1 1 1 1 1 1 0
pred[c("sws", "ps"), "ts"] <- 0
pred
## species bw brw sws ps ts mls gt pi sei odi
## species 0 1 1 1 1 1 1 1 1 1 1
## bw 0 0 1 1 1 1 1 1 1 1 1
## brw 0 1 0 1 1 1 1 1 1 1 1
## sws 0 1 1 0 1 0 1 1 1 1 1
## ps 0 1 1 1 0 0 1 1 1 1 1
## ts 0 1 1 1 1 0 1 1 1 1 1
## mls 0 1 1 1 1 1 0 1 1 1 1
## gt 0 1 1 1 1 1 1 0 1 1 1
## pi 0 1 1 1 1 1 1 1 0 1 1
## sei 0 1 1 1 1 1 1 1 1 0 1
## odi 0 1 1 1 1 1 1 1 1 1 0
Now, we can re-impute the mammalsleep data using passive imputation to account
for the deterministic relation between ts
, sws
, and ps
. We do so simply by
using the updated method vector and predictor matrix in a regular run of mice()
.
imp <- mice(mammalsleep,
method = meth,
predictorMatrix = pred,
maxit = 20,
seed = 235711,
print = FALSE)
If we inspect the diagnostic plots for these imputations, we see much better performance than we achieved in Questions 2.1 or 2.5.
plot(imp)
densityplot(imp)
stripplot(imp)
We can see that the pathological non-convergence of Question 2.5 has been corrected by the passive imputation.
You will now implement passive imputation yourself using the boys dataset. The boys dataset is distributed with mice, so you will be able to access these data once you’ve loaded the mice package. The boys data are a subset of a large Dutch dataset containing growth measures from the Fourth Dutch Growth Study.
Unless otherwise specified, all questions in this section refer to the boys dataset.
Check the documentation for the boys data.
?boys
Summarize the boys data to get a sense of their characteristics.
head(boys)
## age hgt wgt bmi hc gen phb tv reg
## 3 0.035 50.1 3.650 14.54 33.7 <NA> <NA> NA south
## 4 0.038 53.5 3.370 11.77 35.0 <NA> <NA> NA south
## 18 0.057 50.0 3.140 12.56 35.2 <NA> <NA> NA south
## 23 0.060 54.5 4.270 14.37 36.7 <NA> <NA> NA south
## 28 0.062 57.5 5.030 15.21 37.3 <NA> <NA> NA south
## 36 0.068 55.5 4.655 15.11 37.0 <NA> <NA> NA south
summary(boys)
## age hgt wgt bmi
## Min. : 0.035 Min. : 50.00 Min. : 3.14 Min. :11.77
## 1st Qu.: 1.581 1st Qu.: 84.88 1st Qu.: 11.70 1st Qu.:15.90
## Median :10.505 Median :147.30 Median : 34.65 Median :17.45
## Mean : 9.159 Mean :132.15 Mean : 37.15 Mean :18.07
## 3rd Qu.:15.267 3rd Qu.:175.22 3rd Qu.: 59.58 3rd Qu.:19.53
## Max. :21.177 Max. :198.00 Max. :117.40 Max. :31.74
## NA's :20 NA's :4 NA's :21
## hc gen phb tv reg
## Min. :33.70 G1 : 56 P1 : 63 Min. : 1.00 north: 81
## 1st Qu.:48.12 G2 : 50 P2 : 40 1st Qu.: 4.00 east :161
## Median :53.00 G3 : 22 P3 : 19 Median :12.00 west :239
## Mean :51.51 G4 : 42 P4 : 32 Mean :11.89 south:191
## 3rd Qu.:56.00 G5 : 75 P5 : 50 3rd Qu.:20.00 city : 73
## Max. :65.00 NA's:503 P6 : 41 Max. :25.00 NA's : 3
## NA's :46 NA's:503 NA's :522
str(boys)
## 'data.frame': 748 obs. of 9 variables:
## $ age: num 0.035 0.038 0.057 0.06 0.062 0.068 0.068 0.071 0.071 0.073 ...
## $ hgt: num 50.1 53.5 50 54.5 57.5 55.5 52.5 53 55.1 54.5 ...
## $ wgt: num 3.65 3.37 3.14 4.27 5.03 ...
## $ bmi: num 14.5 11.8 12.6 14.4 15.2 ...
## $ hc : num 33.7 35 35.2 36.7 37.3 37 34.9 35.8 36.8 38 ...
## $ gen: Ord.factor w/ 5 levels "G1"<"G2"<"G3"<..: NA NA NA NA NA NA NA NA NA NA ...
## $ phb: Ord.factor w/ 6 levels "P1"<"P2"<"P3"<..: NA NA NA NA NA NA NA NA NA NA ...
## $ tv : int NA NA NA NA NA NA NA NA NA NA ...
## $ reg: Factor w/ 5 levels "north","east",..: 4 4 4 4 4 4 4 3 3 2 ...
Use the mice::md.pattern() function to summarize the response patterns.
(pats <- md.pattern(boys))
## age reg wgt hgt bmi hc gen phb tv
## 223 1 1 1 1 1 1 1 1 1 0
## 19 1 1 1 1 1 1 1 1 0 1
## 1 1 1 1 1 1 1 1 0 1 1
## 1 1 1 1 1 1 1 0 1 0 2
## 437 1 1 1 1 1 1 0 0 0 3
## 43 1 1 1 1 1 0 0 0 0 4
## 16 1 1 1 0 0 1 0 0 0 5
## 1 1 1 1 0 0 0 0 0 0 6
## 1 1 1 0 1 0 1 0 0 0 5
## 1 1 1 0 0 0 1 1 1 1 3
## 1 1 1 0 0 0 0 1 1 1 4
## 1 1 1 0 0 0 0 0 0 0 7
## 3 1 0 1 1 1 1 0 0 0 4
## 0 3 4 20 21 46 503 503 522 1622
There are 13 total patterns. The pattern where gen
, phb
, and tv
are
missing occurs the most frequently.
Multiply impute the boys data using passive imputation for bmi
.
Use passive imputation to maintain the known relation between bmi
, wgt
, and
hgt
.
bmi
as "~ I(wgt / (hgt / 100)^2)"
.imp1
.## Use the mice::make.method() function to generate a default method vector:
(meth <- make.method(boys))
## age hgt wgt bmi hc gen phb tv
## "" "pmm" "pmm" "pmm" "pmm" "polr" "polr" "pmm"
## reg
## "polyreg"
meth["bmi"] <- "~ I(wgt / (hgt / 100)^2)"
meth
## age hgt
## "" "pmm"
## wgt bmi
## "pmm" "~ I(wgt / (hgt / 100)^2)"
## hc gen
## "pmm" "polr"
## phb tv
## "polr" "pmm"
## reg
## "polyreg"
imp1 <- mice(boys, method = meth, maxit = 20, seed = 235711, print = FALSE)
Run the following code to inspect the relation between the imputed BMI and the BMI calculated from the imputed height and weight. If the passive imputation was successful, these points should fall along a perfect line.
xyplot(imp1,
bmi ~ I(wgt / (hgt / 100)^2),
ylab = "Imputed BMI",
xlab = "Calculated BMI")
Create trace plots, density plots, and strip plots from the mids object you created in Question 3.4.
What do you conclude vis-a-vis convergence and the validity of these imputations?
plot(imp1)
densityplot(imp1)
stripplot(imp1)
Although the deterministic definition of bmi
is now preserved in the completed
data, the trace plots indicate some pathological behavior for bmi
, hgt
, and
wgt
. We also get some absurd imputations for bmi
.
Of course, the issues you should have spotted in the above imputations are to
be expected since we have purposefully omitted the second part of passive
imputation. We have not adjusted the predictor matrix, so we have circularity in
the imputations. We used passive imputation to create the imputations for bmi
,
but bmi
is still used as predictor for wgt
and hgt
.
Adjust the predictor matrix to remove the circularity described above.
Re-impute the boys data using the updated predictor matrix.
imp1
.(pred <- imp1$predictorMatrix)
## age hgt wgt bmi hc gen phb tv reg
## age 0 1 1 1 1 1 1 1 1
## hgt 1 0 1 1 1 1 1 1 1
## wgt 1 1 0 1 1 1 1 1 1
## bmi 1 1 1 0 1 1 1 1 1
## hc 1 1 1 1 0 1 1 1 1
## gen 1 1 1 1 1 0 1 1 1
## phb 1 1 1 1 1 1 0 1 1
## tv 1 1 1 1 1 1 1 0 1
## reg 1 1 1 1 1 1 1 1 0
pred[c("hgt", "wgt"), "bmi"] <- 0
pred
## age hgt wgt bmi hc gen phb tv reg
## age 0 1 1 1 1 1 1 1 1
## hgt 1 0 1 0 1 1 1 1 1
## wgt 1 1 0 0 1 1 1 1 1
## bmi 1 1 1 0 1 1 1 1 1
## hc 1 1 1 1 0 1 1 1 1
## gen 1 1 1 1 1 0 1 1 1
## phb 1 1 1 1 1 1 0 1 1
## tv 1 1 1 1 1 1 1 0 1
## reg 1 1 1 1 1 1 1 1 0
imp1 <- mice(boys,
method = meth,
predictorMatrix = pred,
maxit = 20,
seed = 235711,
print = FALSE)
Recreate the xyplot()
from above using the imputations from Question
3.6.
Is the deterministic definition of bmi
maintained in the imputed data?
xyplot(imp1,
bmi ~ I(wgt / (hgt / 100)^2),
ylab="Imputed BMI",
xlab="Calculated BMI")
Yes, the relation is maintained. All points fall along the \(Y = X\) line.
Create trace plots, density plots, and strip plots from the mids object you created in Question 3.6.
What do you conclude vis-a-vis convergence and the validity of these imputations?
plot(imp1)
densityplot(imp1)
stripplot(imp1)
Everything looks good now. The trace plots indicate good convergence (though we clearly need more than the default 5 iterations for the model to stabilize). Judging from the density plots and strip plots, the imputations also seem sensible.
Just for fun: What you shouldn’t do with passive imputation
Never fix all relations. The algorithm will never escape the starting values.
meth <- make.method(boys)
meth["bmi"] <- "~ I(wgt / (hgt / 100)^2)"
meth["wgt"] <- "~ I(bmi * (hgt / 100)^2)"
meth["hgt"] <- "~ I(sqrt(wgt / bmi) * 100)"
imp <- mice(boys, method = meth, seed = 235711, print = FALSE)
plot(imp, c("hgt", "wgt", "bmi"))
End of Lab 2b