NAEP

NCES research program on efficient multilevel modeling of NAEP data

This program has been running formally since 2003, though related work on it began earlier in two projects, one during my appointment 2000-2002 as Chief Statistician at ESSI, the other in 2002-2003.

Background to NAEP analyses

The current NAEP analyses for NAEP publications are based on a set of tools for the psychometric models, the survey designs used and imputation of student ability using very large regression models. These can be briefly described as follows (for the national math surveys):

Survey designs

The sampling of students taking the test in the early surveys was three-stage cluster sampling, of national PSUs, then schools within PSUs, then students within schools. In later surveys the national sample comprised a set of state samples, in which the design was two-stage sampling, of schools within state and students within schools. The school sampling was stratified with oversampling of minority schools, and the student sampling oversampled minority students.

Adjustment of estimates for the survey design is by jackknifing of PSUs and reweighting for the oversampling of minorities. No adjustment is made for the school sampling in the national surveys, other than including school fixed effects in the very large regression model described below.

Psychometric models

The 2PL, 3PL and ordered categorical response models are used. The item parameters for the models are obtained by fitting the item responses by a null (no explanatory variables) item model, ignoring the survey design and the variables recorded in the survey. The estimated item parameters are then held fixed, and the item responses are regressed on the latent student abilities using a very large "conditioning" regression model with ~200 principal variables of about 1000 survey explanatory variables, including the school fixed effects from the survey design, and main effects and some two-level interactions of a large number of important survey variables. The latent student abilities are then multiply imputed by generating five values from the posterior distribution of the student abilities, given their item responses and explanatory variable values, and the item and regression model parameter estimates.

Group differences in ability, and other tabulations by important variables, are then made five times for the five plausible values, and these are combined by the standard rules for multiple imputation to give a single analysis for each varaible, or pairs of variables for cross-tabulations.

Summary

The current NAEP analysis methods are based on the psychometric models developed in the 1980s (originally by Bock and Aitkin 1981) and take no account of the school sampling in the survey design. The imputation of abilities reflects the limitations of 1980s computer power, in not being able to handle simultaneously the item parameters and explanatory variables in a single model. The adjustment to standard errors for the PSU design effect (in the early surveys) does not account for the much larger design effect resulting from the school sampling, and neither does the inclusion of school fixed effects in the conditioning model.

Philosophy of our NAEP research program

The aim of our program is to develop a unified high-level efficient statistical modeling analysis system for NAEP data, which will enable a detailed and rich analysis of NAEP surveys, by

The representation of the three-stage national survey design by two additional levels (and the state sampling design by one additional level) in a multi-level IRT model.
The replacement of the separate estimation of the item and the regression model parameters by the simultaneous estimation of both sets of parameters.
The replacement of plausible value imputation by direct maximum likelihood analysis of all parameters in a single analysis.
The replacement of weighting methods for stratification and non-response by appropriate model-based methods.
The replacement of complete-case, single or multiple imputation methods for correcting standard errors for incomplete data by appropriate model-based methods.
The integration of these tools and structures in a very efficient computing framework.
The dissemination of this approach through journal article and monograph publication, seminars and short course training, and collaborative work with educational and statistical research groups.

Program history
Past projects

The two projects preceding our NAEP program are:

1) Imputation and Data Quality (June 2002, M. Aitkin and Y.-Y. Shieh)

This project at ESSI examined the computation of parameter ML estimates and their standard errors for simple linear regression models with missing covariate data by computing the estimates and the observed data information matrix using the EM algorithm for maximum likelihood with incomplete data. The ML estimates were nearly unbiased and had smaller (sometimes much smaller) mean square errors than the complete case estimates.

The importance of this project was that the additional information in the incomplete cases could be obtained relatively easily (in this simple model), and the standard errors resulting were uniformly smaller than those for the complete case estimates.

2) Standard Errors from the Information Matrix with Missing Covariate Data (September 2003, M. Aitkin and T. Chadwick)

This project extended the approach above to two-variable regression models using the EM algorithm, and compared parameter estimates and standard errors with those from multiple imputation (MI). The MI and ML estimates required assuming a joint normal distribution for the covariates; biases and standard errors for the MI estimates were slightly larger than those for the ML estimates. If the covariate distribution was binary rather than normal, the parameter estimates were almost unaffected, but the information matrix gave serious biases in the standard errors.

The importance of this project was that it suggested a general method for standard errors for parameter estimates in models with missing covariate data, by computing the information matrix using the actual (empirical) covariate distribution, rather than a multivariate normal distribution, for the contributions of the incomplete observations to this matrix.

Projects completed under the NAEP program (and their main contributions)

Identification of Ability Distributions in IRT Models for NAEP Items (August 2004, Aitkin and Aitkin)

This project began the NAEP series. It set out the generalized linear model framework for IRT models, and its extension to multilevel models for clustered survey designs.

The importance of this project was that it showed that the estimates of upper-(individual) level parameters by Gaussian quadrature, used currently in the NAEP analysis for 2PL and other models, were very robust to various degrees of non-normality of the ability distribution, and that more complex semi-nonparametric and fully nonparametric forms of estimation did not improve the upper-level parameter estimates, and were much more computer-intensive.