How are INSPECT® items developed?
INSPECT items are not back-aligned to the California Content Standards. Instead, each item is written specifically to address a standard. INSPECT® employs a professional team of item writers consisting of both current and former teachers and administrators. These highly skilled item writers are based throughout California and write INSPECT® items specifically targeting the California Content Standards. Throughout this process, each item writer maintains a focus on content accuracy, grade level standard alignment, and difficulty level (i.e., low, medium, and high). The item writers strive to provide the most likely wrong answers so that the most common cognitive disconnect may be revealed and targeted for intervention. The item writers also provide a clear rationale for each wrong answer so these cognitive disconnects can easily be identified. Once an item has been completed and submitted for evaluation, it goes through an intensive review process. The INSPECT® Coordinator evaluates each item and determines whether it should be accepted into the next level of evaluation or rejected; this determination is made by assessing the content, standard alignment, and difficulty level of each item. Rejected items are sent back to the writer and accepted items are sent to two additional Content Area Specialists for further scrutiny. Only items that make it through all three levels of review receive final acceptance and are processed into the system for use in testing.
How can the INSPECT® items be administered/delivered?
INSPECT® items can be delivered in two ways.
1. On-line Delivery and Scoring.
This is the most efficient, immediate, and economical means of delivery. Implementation of the on-line delivery option allows for the fullest and most complete use of INSPECT® and the WHAT® process.
2. Paper-Pencil Delivery, Local Scanning Mode (School Site or District).
Booklets and answer sheets printed and scanned on-site. Answer sheets scanned into the server (at school site or district office) so that standards based reports of results can be displayed via OARS. This can be performed using most manufacturers’ scanners.
Reporting is done standard-by-standard at the individual, classroom, site, and district levels.
How are reliability and validity of the items established?
Several steps are taken in order to ensure the reliability and validity of each INSPECT® item. This level of quality control begins with the initial writing of items and continues well after each item has been administered. Our treatment of these issues include quantitative and statistical approaches. When problems with an item are intercepted, steps are taken to remedy the problem.
Our first efforts at establishing validity are centered around content validity. This is sometimes referred to as content definition (Messick, 1989) and is critical for score interpretation and item response validation (Haladyna, 1999). The best way to establish content validity is through the use of expert judges. For each INSPECT® item, three content experts judge whether or not a particular item is aligned to the standard that the item writer has chosen.
In addition to having content experts review each item, we also conduct focus groups with grade level teams of teachers. Information from these teams are compiled and used to modify problem items or other areas of concern.
Item discrimination is also assessed to ensure that each item has the ability to sensitively measure individual differences in knowledge and ability among test-takers. Each item should be able to discriminate between high performing students and lower performing students. That is, a higher percent of students with more knowledge in a given subject area should answer those questions with greater accuracy than students with lower ability or knowledge in that area. Item discrimination is estimated using a point-biserial. This gives an estimate of the relationship between a given item and overall test performance. The point-biserial has a direct relationship to alpha (the reliability coefficient that is used to assess the reliability of a given test).