The Test Validity Trojan Horse: Utah and Florida’s Dangerous Game of Education Poker With Our Public School Children

Public controversy deals with stereotypes, never in subtleties. The Luddites can smash a device but to improve a system requires calm study . . . . Sound policy is not for tests or against tests, but how tests are used. But all the public hears is endless angry clamor from extremists who see no good in any test, or no evil.

-Dr. Lee Cronbach (1975)-

Forward:

IMG_0172

In response to educator/activist Christel Swasey’s very public email to the Utah Board of Psychology regarding her blog post “Utah and Florida’s AIR/SAGE Test Not Valid” (October, 2015), Utah State Board of Education Vice-Chairman Dave Thomas responded with a highly inaccurate, and dangerous assumption that the high stakes, Utah Common Core SAGE test has been subjected to a individualized, independent, rigorous validity study (partial excerpt): 

Screen Shot 2015-10-20 at 1.58.59 PM

As can be expected from any lay educator, Ms. Swasey made a few innocent assumptions and overgeneralizations asserting that Utah’s SAGE test was entirely invalidated via a independent study focusing on Florida schools (using Utah’s data from the private corporation that designed both state’s tests.  You can’t make this stuff up.)   However, Vice-Chairman Thomas’s usage of the same logical fallacy and overgeneralizations, put’s Ms. Swazzey’s innocent mistakes to shame.  He does so by stating that the Florida validity study, “expressly validated the SAGE test.” images-16

Vice-Chair Thomas’s response to Ms. Swazzey’s letter failed to answer many important issues vital to the economic, educational, financial, and moral health of our community.   His non-response was a attempt to get stakeholders in education to focus on irrelevant “trees” at the expense of the “forest” comprised of our children.    That is unacceptable to me as citizen, father, and local clinical community scientist.   

This blog post is about the “forest”:

1.  What exactly IS validity?  (See below)

2.  Did the Utah SAGE test undergo a validity study? (No. See below) 

3.  How important are validity issues in educational testing to your children? (Extremely.  See below)

4.  Will the next 9 pages be the most important education information considered for parents of Utah and Florida’s “divergent learning” students?  (Probably.  See below)

These issues are answered under the constructs of peer reviewed science, parental common sense, and “best practices” ethics in the joint fields of education, assessment psychology, and psychometrics (the field of test design).

My focus is on the parents in our joint communities.   What the respective Board’s of Education and politicians in Utah and Florida do, or do not do, with this information is not a concern of mine.    The world education politics is lucrative, and brutal.  I want no part of that world for me, my family or our clinic.  This post is by a parent, for a parent…and any parent can write me directly at drgary@earlylifepsych.com with any questions.  Press and politicians need not enquire.   

images-3

-Gary Thompson, Psy.D.-

________________________

Chapter 1: 

-Arrogance, Ethics & Psychological Testing: How To (Almost) Get Kicked Out Of Doctoral Training-

Clinical Psychologists are the only mental health care professionals in the country trained and educated concurrently with providing therapy and psychological testing.   As a graduate student in this field, securing needed, multiple testing clinical Clerkships, Internships and Post Doctoral trainings are both a matter of luck and skill.   Clinical training openings are available for about 1 out of every 20 applicants. The few lucky ones who get to work side by side with a expert level assessment psychologist, are subjected to thousands of hours of training, observation and evaluation in the areas of test validity, administration, and assessment ethics as such apply to human subjects.   (Doctoral level psychometricians, however, do not have training and experience with administering the test they create with live clinical clients. Crudely stated, psychometricians create the test, psychologists give the tests and interpret results).

As a third year doctoral student, I was one of the lucky few in Southern California to be trained at the California Family Counseling Center Testing Clerkship under the direction of clinical psychologist Dr. “Jones”.

4 months into the training, Dr. Jones called me into her office holding my latest testing report submission on a 14-year-old girl. From best of my recollection, she shocked me with the following statement:

Gary, I have decided to place you on formal clinical probation for 30 days. You will not be allowed to test or see live clients under my license during this time. You show great promise as a future clinician in the field of psychological assessment, however, I believe your current level of intellectual arrogance is hindering your professional development, and I’m uncomfortable granting you the privilege of working with clients under my professional license.”

Dr. “Jones” then went over my test report, which was riddled with “red pen corrections”.   She expertly went over multiple examples of how I relied too much on testing result numbers, and arrogantly and definitely based multiple potentially life changing clinical conclusions on these numbers alone.   For example, in the “Clinical Summary” section of the report, I wrote the following:

“Client’s test results clearly show she has ADHD, as well as multiple learning disabilities as outlined in the Diagnosis section of this report.”

Dr. “Jones” replaced that line with the following:

“Although tests results indicate strong tendencies towards multiple cognitive and behavioral disorders, multiple background issues of this client strongly suggest that testing results be viewed and interpreted with great caution.”

I was given 30 days to contemplate and write about the only two things regarding testing, and test validity, that any parent, lawmaker, or educator will ever need to know about the subject:

1. Every cognitive, emotional and academic test has strengths and weaknesses. Knowing what a test cannot do, and what populations it fails to accurately and validly measure in with an acceptable degree of statistical measurement, is of paramount importance ethically and clinically.

2. No machine (e.g., psychological test) or data obtained from a machine, can replace common sense knowledge of parents, clinical observation/instinct/training, knowledge of cultural factors, and professional ethical values.  

The respective offices of education in the States of Florida and Utah are currently exhibiting the same dangerous levels of arrogance I exhibited as a new trainee in clinical psychology, by placing an almost “religious-like” reliance on numbers,and failing to understand the basic concepts of test validity and the ethics surrounding academic and achievement assessment, and test development.

No test is perfect. All tests have limitations. One size cannot fit all.

images-8

Chapter 2: 

-Test Validity: Keeping It Simple-

Sans discussions of the ridiculously advanced science behind computer adaptive testing (CAT), Utah’s SAGE & Florida’s FCAT Common Core Achievement Tests simply perform the following task:

The measurement of predetermined variables and/or educational constructs, that results in a data set that can be accurately interpreted.  

That basic definition applies to any academic, cognitive, or psychological test.

Simply stated, the SAGE/FSA test measure “something.”Per publications from both the Utah State Department of Education, and the Florida Department of Education, that measured “something” is the construct of “career and college readiness” in math and language arts.   For parents, lawmakers, teachers and activists, it is imperative to know how accurate and reliable SAGE/FSA tests are in determining “career and college readiness” for public school children.

This process of determining the level of accuracy and reliability is called “validity”.

Very simply put, “validity” is the extent to which a test measures what it claims to measure.

It refers to the ability of the Utah SAGE & Florida FSA to accurately and fairly measure the construct of “career and college readiness” in math and language arts in over 3.2 million combined children in the States of Utah and Florida.

Unknown

Ethics and best practices demand that test designers to present evidence concerning multiple types of validity, and to present and summarize these statistical findings in a test/technical manual.  As the above picture illustrates, activist emphasis on validity should NOT be entirely focuses on the test itself, it is imperative that parent stakeholders examine the ridiculous and shamefully misleading validity claims and public statements that these test absolutely measure career and college readiness….in ALL students:

Screen Shot 2015-10-20 at 1.24.38 PM
The boldest claim ever originated via a State Office administration in this century…..

Chapter 3:

-Test Validity: “There Are Lies, Damn Lies, Then There Are Statistics-

We have bought into the notion of strict academic accountability via the use of tests designed by American Institute of Research, with the presumption that testing and more testing, coupled with the threat of not being promoted from grade to grade and of not graduating from high school, will be the engine that drives improvement in instruction and student achievement.images-4

Yet there are many inherent flaws to this approach, whereas it threatens to leave behind the very students that the legislation and testing movement purport to be helping. A strict adherence toward ethical standards in educational testing can help eliminate the influences of political agendas on the science of tests and measurements.

One size does not fit all. No one measure of academic achievement can be the basis of such high stakes consequences in the States of Utah and Florida.

Rothstein (2000) used a baseball example to question the veracity of using a single high stakes test to measure a student’s knowledge:

“Mike Piazza, batting .332, could win this year’s Most Valuable Player award. He has been good every year, with a .330 career batting average…and a member of each All Star team since his rookie season. The Mets reward Piazza for this high achievement, at the rate of $13 million a year.

But what if the team decided to pay him based not on overall performance but on how he hit during one arbitrarily chosen week? How well would one week’s at-bats describe the ability of a true .330 hitter?

Not very. Last week Piazza batted only .200. But in the second week of August he batted .538. If you picked a random week this season, you would have only a 7-in-10 chance of choosing one in which he hit .250 or higher.”(p. B11)

Rothstein questioned the validity of assessing a student’s knowledge at one point in time.  

Over 75 years worth of peer-reviewed studies have documented that how students fare on standardized tests can be greatly influenced by a host of external factors, including stress over taking the test, amount of sleep, distractions at the IMG_0177testing site, time of day, emotional state, trauma, and others:  Including a child’s zip code.   This reality was not missed by a 17 year old high school student who spoke about the limits of high stakes testing during her graduation speech:

“So I’m the valedictorian. Number one. But, what separates me from number two, three, four, five, six, 50, or 120? Nothing but meaningless numbers. All these randomly assigned numbers reflect nothing about the true character of an individual. They say nothing…about desire or will. Nothing about values or morals. Nothing about intelligence. Nothing about creativity. Nothing about heart. Numbers cannot and will not ever be able to tell you who a person really is. Yet in today’s society we are sadly becoming more and more number oriented. Schools today are being forced to teach to the numbers…. The MCAS serves as just another set of meaningless numbers that add one more reason to focus on scores and forget learning…. Judging us by our competency on a biased test is perhaps the biggest injustice that the state could ever inflict upon us…. Does anyone care about the human beings behind the numbers?

– (Annelise Schantz, the valedictorian of the 2000 graduating class at Hudson High School in Massachusetts)-

IMG_0430

Chapter 4: 

– Relevant Content and Ethical Issues In Alpine Testing’s Validity Report-

The most effective way to increase student achievement involves improving classroom instruction and student support services, not the use of high-stakes testing.”
-The Ethical Dilemmas of High-Stakes Testing and Issues for Teacher Preparation Programs-

Issue 1:

Validity Technical Manuals For AIR Produced Tests in Utah & Florida, Have Yet To Be Completed And Delivered To Stakeholders:

A primary source for evidence of development and validation activities for assessment programs is the documentation provided in a program’s technical manual and supporting technical reports… some of the development and validation activities are ongoing and a comprehensive technical manual was not yet available. “ (Alpine Testing Solution, Inc. Validity Report P.30)

The American Psychological Association’s Ethic Code 9.0 clearly prohibits the use of tests on human subjects that have yet to undergo a completed validation process as outlined, traditionally, in published technical manuals:

APA Ethics Code 9.02: Use of assessments:

(b) Psychologists use assessment instruments whose validity and reliability have been established for use with members of the population tested. When such validity or reliability has not been established, psychologists describe the strengths and limitations of test results and interpretation.

Issue 2:

Fairness & Bias Issues Regarding Vulnerable Populations Have Yet To Be Ethically Validated In Florida Or Utah:

“…Due to the limited time frame for developing the FSA, item reviews related to content, cognitive complexity, bias/sensitivity, etc. were not conducted by Florida stakeholders.” (Alpine Testing Solution, Inc. Validity Report P.35)

Consequences For Not Validating Test For Vulnerable Populations:

Given the interpretation of “reading” by FLDOE, use of a human reader is not an allowable accommodation to ensure the construct remains intact. Students who have mild-moderate intellectual disabilities and limited reading skills will have limited access to the passages without the use of a human reader. Students with vision or hearing impairments, who also have limited ability to read, including reading braille, will have limited access to the passages without the use of a human reader. When required to read independently, these groups of students will not have the ability to demonstrate their understanding of the text beyond the ability to decode and read fluently. For example, without access to the passage, the students will be unable to demonstrate their ability to draw conclusions, compare texts, or identify the central/main idea.” (Alpine Testing Solution, Inc. Validity Report P.35)IMG_0156

Neither Utah nor Florida has produced validity documents suggesting that either the SAGE or FSA high stakes academic achievement tests can validly measure achievement in vulnerable student populations, or that the current testing accommodations allowed or banned, are appropriate or fair.   Both State education entities in Florida, as well as the test designer AIR, make claims that these test accurately measure “career and college readiness” in math and language arts.  This is in clear violation of education test/measurement ethics as established by the National Counsel of Measurement in Education’s Code of Professional Responsibility (NCME Ethics Code):

NCME Code of Ethics:

Section 1: Responsibilities of Those Who Develop Assessment Products and Services:

1.2 “Develop assessment products and services that are as free as possible from bias due to characteristics irrelevant to the construct being measured, such as gender, ethnicity, race, socioeconomic status, disability, religion, age, or national origin.”

1.9 “Avoid false or unsubstantiated claims in test preparation and program support materials and services about an assessment or its use and interpretation.”

Section 6: Responsibilities of Those Who Interpret, Use, and Communicate Assessment Results:

“The interpretation, use, and communication of assessment results should promote valid inferences and minimize invalid ones. Persons who interpret, use, and communicate assessment results have a professional responsibility to:
6.8 Avoid making, and actively discourage others from making, inaccurate reports, unsubstantiated claims, inappropriate interpretations, or otherwise false and misleading statements about assessment results.”

USOE 2014 SAGE Test Press Release

This claim has no basis in objective fact. It is a dangerous commission of the truth......
This claim has no basis in objective fact. It is a dangerous commission of the truth……

Issue 3:

Alpine Testing Solutions, Inc., & Partner edCount, Inc., Failure To Disclose Conflicts of Interests In Their Validity Report

EdCount lists the contractor who designed the SAGE and FSA test (American Institute of Research), as a partner on its corporate website (Ed Count Corporate Webpage):

Screen Shot 2015-10-15 at 3.43.51 PM

(Note: Within 2 hours after taking the above screen shot which list AIR as a partner, edCount, Inc. deleted the AIR reference from its corporate web page.)

This lack of disclosure violates multiple professional ethical codes as outlined in the American Educational Research Association (AERA Ethics Code), and the National Counsel on Measurement in Education: Code of Professional Responsibilities in Educational Measurement (NCME Ethics Code):

AERA Code of Ethics:

10.02 Disclosure

Education researchers disclose relevant sources of financial support and relevant personal or professional relationships that may have the appearance of or potential for a conflict of interest to an employer or client, to the sponsors of their professional work, and to the public in written and verbal reports.

14.05 Reporting on Research

(h) In reporting on research, education researchers address any potential conflicts of interest that may have influenced or have the appearance of influencing the research, along with a statement of how these were managed in the conduct of the research.

IMG_0043

17.0 Responsibilities of Reviewers

Education researchers adhere to the highest ethical standards, including standards of competence, when serving as reviewers for publication, grant support, or other evaluation purposes.

(b) Education researchers disclose conflicts of interest or decline requests for reviews of the work of others where conflicts of interest are involved.

NCME Code of Ethics:

Section 8: Responsibilities of Those Who Evaluate Educational Programs and Conduct Research on Assessments:
8.2 Disclose any associations that they have with authors, test publishers, or others involved with the assessment and refrain from participation if such associations might affect the objectivity of the research or evaluation.

Chapter 5: 

-11 Findings of Facts & General Conclusions-

  1. Utah has yet to have its own AIR produced SAGE test “independently” evaluated for validity.

  2. The test designer, AIR, has yet to supply either Florida or Utah stakeholders with validity “technical manuals” as required by professional educational psychology ethics and practice.

  3. There are no validity documents currently available that meets ethical and industry standards regarding the viability of either the SAGE or FSAtest with vulnerable student populations, “divergent learners”, African American, or Latino students.

  4. Substantial evidence strongly suggests that both the SAGE and FSAare currently works in progress, and as such, it is reasonable and proper to strongly infer that both tests are still in their experimental phase of development.

  5. There is no objective evidence currently available that supports Florida and Utah education administrative claims that the Utah SAGE or Florida FSA test can validly measure “career and college readiness” in any of the approximate 3.1 million combined public school children.

  6. There is no objective evidence currently available that supports Florida and Utah education administrative claims that the Utah SAGE or Florida FSA test can validly measure “career and college readiness” in a wide range of “vulnerable” student populations, or that the SAGE or FSA test designer provide evidence that accommodations provided (or rejected) for these students reduce or eliminate bias and fairness issues established via 100 years of peer reviewed research.

  7. There is currently no independently validated, or peer reviewed research to support that the new, rigorous academic standards being measured by Florida and Utah educators are developmentally and cognitively appropriate for the students being tested by the SAGE and FSA.

  8. The corporations hired to independently evaluate the psychometric qualities of the Florida FSA(using Utah SAGE technical documents as a key source of information) failed to disclose there conflicts of interests, thus calling into question the reliability of the entire $600,000.00 validity report.

  9. Alpine Testing did not conduct an independent validity review on fairness, bias, sensitivity, for student populations with learning disabilities, English language learners, or African American students.

  10. Alpine and Florida relied entirely upon Utah’s department of education “review” of these issues. (A “review” by paid employees of a State education entity does not, on any level, meet the standards of unbiased psychometric analysis required in the private sector world of test validity).

  11. Utah’s “review” of these issues did not include ANY actual validity data regarding the fairness, bias or sensitivity of the SAGE test for these populations for the simple reason that none were done. USOE’s “review” not only does not meet ethical standards, such a review can hardly been deemed either independent, or without potential bias.

In layman’s terms, because of the current experimental nature of both the SAGE and FSA test, neither the Utah State Office of Education, the Florida Department of Education, Alpine Testing, or the American Institute of Research has provided ANY psychometric data which indicates that vulnerable student population’s academic achievement has been measured accurately or fairly. In addition, the offices of education in states of Utah and Florida have yet to publish any independent research which refutes the claims of over 500 of the nations leading developmental and research psychologists that the standards being measured are, in fact, developmentally inappropriate to teach (let alone measure with any test, no matter how valid).

In summary, it is reasonable and proper to state that a substantial amount of evidence supports a conclusion that both Utah and Florida Offices of Education, via their joint contractual participation with behavioral research corporation American Institute of Research, are in the midst of a massive academic & psychological experiment using public school children as their independent variable of analysis….

…without informed written parental consent.images-12

Until these issues have been admitted, clarified, and resolved, it may be in the best interests of parents to heed the instructions of Alpine School Board Member, Brian Halliday, to opt their children out Utah’s SAGE test, or any other test that does not have published documents of validity as such may relate to their unique children.

“Parents are, and must always be, the resident experts of their own children.”

 

-Gary Thompson, Psy.D.-

Early Life Child Psychology & Education Center, Inc.

http://www.earlylifepsych.com