Effective Assessment Practices Examples for Reading and Writing

Abstract

This affiliate focuses on primal ideas for understanding literacy assessment to assist with educational decisions. Included is an overview of different literacy assessments, along with mutual assessment procedures used in schools and applications of assessment practices to support effective teaching. Readers of the affiliate volition gain an understanding of different types of assessments, how assessment techniques are used in schools, and how assessment results can inform didactics.

Learning Objectives

Afterward reading this affiliate, readers will exist able to

explain how testing fits into the larger category of cess;
describe different literacy assessments and how they are commonly used in schools;
discuss why assessment findings are judged based on their validity for answering educational questions and making decisions;
explicate the importance of reliability and validity of test scores and why psychometric properties are important for interpreting certain types of assessment results;
critique literacy assessments in terms of how they tin exist used or misused.

Introduction

When the topic of educational assessment is brought up, most educators immediately remember of high-stakes tests used to approximate students' progress in meeting a set of educational standards. It makes sense that much of the dialogue concerning educational assessment centers on high-stakes testing because it is this kind of cess that is virtually controversial in the American education organisation, particularly since the vast majority of states take adopted theCommon Core State Standards for English Language Arts & Literacy in History/Social Studies, Scientific discipline, and Technical Subjects (CCSS; National Governors Association Centre for Best Practices & Quango of Chief State School Officers [NGA & CCSSO], 2010), along with high stakes tests intended to assess students' proficiency in meeting them. Merely high-stakes tests are actually just a fraction of cess procedures used in schools, and many other assessments are as important in influencing instructional decisions. This chapter discusses a wide telescopic of literacy assessments commonly used in kindergarten through twelfth grade classrooms, along with ways to use results to make educational decisions.

Literacy Assessment

To empathise literacy assessment, we first need to think nearly the term "literacy," which is discussed throughout the chapters in this textbook. Literacy has traditionally been regarded every bit having to do with the ability to read and write. More recently, literacy has evolved to encompass multidimensional abilities such as listening, speaking, viewing, and performing (NGA & CCSSO, 2010), along with cultural and societal factors (Snowfall, 2002) that tin facilitate or constrain literacy development. This multidimensional definition of literacy requires educators and policy makers to conceptualize literacy in complex ways. Controversies arise when the richness of literacy is overly simplified by assessments that are not multidimensional or authentic, such every bit the overuse of multiple-choice questions. Educators may find the lack of authenticity of these assessments frustrating when results exercise not appear to represent what their students know and tin do. On the other hand, more accurate assessment methods, such as observing students who are deliberating the meaning of texts during group discussions, do non precisely measure literacy skills, which can limit the kinds of decisions that can be made.

Fifty-fifty though the cess of literacy using multiple choice items versus more than authentic procedures seems similar opposites, they do have an of import characteristic in common: they both can provide answers to educational questions. Whether one approach is more valuable than the other, or whether both are needed, depends entirely on the kind of questions existence asked. So if someone asks you if a multiple option examination is a good exam or if observing a educatee'due south reading is a amend assessment process, your answer will depend on many dissimilar factors, such every bit the purpose of the assessment, along with the quality of the assessment tool, the skills of the person who is using it, and the educational decisions needing to be made. This chapter will assist yous learn more about how to brand decisions about using literacy assessments and how to use them to improve teaching and learning.

Taxonomy of Literacy Assessments

To understand the purposes of different types of literacy assessment, it is helpful to categorize them based on their purposes. Information technology should be noted that in that location is much more research on the assessment of reading compared to cess of other literacy skills, making examples in the affiliate somewhat weighted toward reading assessments. Examples of assessments not express to reading have also been included, where appropriate, as a reminder that literacy includes reading, writing, listening, speaking, viewing, and performing, consistent with the definition of literacy provided in Chapter 1 of this textbook.

Formal Assessments

Ane manner to categorize literacy assessments is whether they are formal or informal. Formal literacy assessments commonly involve the use of some kind of standardized procedures that require administering and scoring the assessment in the same way for all students. An example of formal assessments is state tests, which evaluate proficiency in one or more than literacy domains, such as reading, writing, and listening. During the administration of state tests, students are all given the aforementioned test at their given grade levels, teachers read the same directions in the aforementioned mode to all students, the students are given the same amount of time to consummate the test (unless the student received test accommodations due to a disability), and the tests are scored and reported using the aforementioned procedures. Standardization allows command over factors that can unintentionally influence students' scores, such as how directions are given, how teachers answer to students' questions, and how teachers score students' responses. Certain land examination scores are likewise usually classified equally criterion-referenced because they measure how students achieve in reference to "a stock-still set of predetermined criteria or learning standards" (edglossary.org, 2014). Each land specifies standards students should meet at each class level, and land test scores reflect how well students achieved in relation to these standards. For example, on a scale of 1 to iv, if a student achieved a score of "2" this score would typically reflect that the student is not yet meeting the standards for their grade, and he or she may exist eligible for extra assistance toward meeting them.

Some other example of a criterion-referenced score is the score achieved on a allow examination to drive a car. A predetermined cut score is used to decide who is ready to get behind the wheel of a car, and information technology is possible for all test takers to meet the criterion (e.g., fourscore% items correct or higher). Criterion-referenced test scores are contrasted with normatively referenced (i.e., norm-referenced) test scores, such as an SAT score. How a student does depends on how other students score who take the examination, so at that place is no criterion score to encounter or exceed. To score high, all a student has to do is do better than most everyone else. Norm-referenced scores are often associated with diagnostic tests, which volition be described in further particular in the section of this affiliate under the heading "Diagnostic Literacy Assessments."

Breezy Assessments

Informal literacy assessments are more flexible than formal assessments considering they tin be adjusted co-ordinate to the student being assessed or a detail assessment context. Teachers brand decisions regarding with whom informal assessments are used, how the assessments are done, and how to interpret findings. Informal literacy assessments can easily comprise all areas of literacy such every bit speaking, listening, viewing, and performing rather than focusing more exclusively on reading and writing. For example, a teacher who observes and records behaviors of a grouping of students who view and discuss a video is likely engaging in informal assessment of the student'due south reading, writing, speaking, listening, and/or performing behaviors.

Teachers engage in a multitude of informal assessments each time they interact with their students. Request students to write down something they learned during an English arts (ELA) form or something they are confused about is a grade of informal assessment. Observing students engaging in cooperative learning group discussions, taking notes while they program a project, and even observing the expressions on students' faces during a group activity are all types of breezy assessment. Besides, observing students' level of engagement during literacy tasks is informal assessment when procedures are flexible and individualized. Informal classroom-based self-assessments and student inventories used to make up one's mind students' attitudes well-nigh reading may be useful toward planning and adjusting teaching as well (Afflerbach & Cho, 2011).

Methods for assessing literacy that fall somewhere between informal and formal include reading inventories, such as the Qualitative Reading Inventory- five (QRI-5; Leslie & Caldwell, 2010). Reading inventories require students to read word lists, passages, and answer questions, and although there are specific directions for how to administer and score them, they offer flexibility in observing how students engage in literacy tasks. Reading inventories are oft used to record observations of reading behaviors rather than to simply measure reading achievement.

Formative Assessments

Another useful way to categorize literacy assessments is whether they are determinative or summative. Determinative assessments are used to "course" a program to improve learning. An instance of formative literacy assessment might involve a classroom instructor checking how many letters and sounds her students know as she plans decoding lessons. Students knowing only a few letter sounds could be given texts that do not include letters and words they cannot decode to prevent them from guessing at words. Students who know nigh of their letter sounds could be given texts that contain more letters and letter combinations that they tin can practice sounding out (e.thousand., the words in their texts might include all the short vowels and some digraphs they have learned, such as sh, th, ck). In this example, using a formative letter-sound cess helped the teacher to select what to teach rather than simply evaluate what the pupil knows. Formative assessment is intended to provide teachers with information to ameliorate students' learning, based on what students need.

Summative Assessments

Summative assessments are used to "sum upwardly" if students have met a specified level of proficiency or learning objective. State tests fall under the category of summative assessments because they are generally given to see which students have met a disquisitional level of proficiency, as defined past standards adopted by a particular land. Unit tests are also summative when they sum up how students did in meeting particular literacy objectives by using their noesis related to reading, writing, listening, speaking, viewing, and performing. A spelling test tin can be both formative and summative. It is formative when the teacher is using the information to plan lessons such as what to reteach, and information technology is summative if used to decide whether students showed mastery of a spelling dominion such as "dropping the 'e' and calculation '-ing'." Then the goal of determinative assessment is by and large to ingrade education, whereas the goal of summative assessment is to summarize the extent to which students surpass a sure level of proficiency at an end-point of instruction, such as at the end of an instructional unit or at the finish of a school year.

Literacy Screenings

Some other manner to categorize assessments is whether they are used for screening or diagnostic purposes. Literacy screenings share characteristics with medical screenings, such as hearing and vision checks in the nurse'southward office or when a patients' blood pressure is checked at the beginning of a visit to the medico'south office. Screenings are typically quick and given to all members of a population (e.g., all students, all patients) to identify potential problems that may not be recognized during day-to-day interactions. See Table 1 for examples of unremarkably used universal literacy screeners, along with links to data about their use.

Table 1. Examples of Usually Used Universal Literacy Screeners
Universal Literacy Screeners	Links to additional information
AIMSweb	http://www.aimsweb.com/
Dynamic Indicators of Basic Early Literacy Skills—Next	https://dibels.uoregon.edu/
STAR Reading	http://www.renaissance.com/appraise
Phonological Awareness Literacy Screening (PALS)	https://pals.virginia.edu/

Among the most popular literacy screeners used in schools are the Dynamic Indic a tors of B asic Early Literacy Skills—Next Edition (DIBELS Adjacent; Adept & Kaminski, 2011) and AIMSweb (Pearson, 2012). These screeners include sets of items administered to all children at certain class levels (which is why they are often chosen "universal" literacy screeners) to do quick checks of their literacy development and identify potential issues that may not be visible using less formal ways. Literacy screenings crave young children to complete i-minute tasks such as naming sounds they hear in spoken words (e.thou., "cat" has the sounds /c/ /a/ /t/), naming the sounds of letters they see (due east.g., letter "p" says /p/), and starting in first course, reading words in brief passages. Universal literacy screenings such every bit DIBELS Next and AIMSweb are often characterized as "fluency" assessments because they measure both accuracy and efficiency in completing tasks. For these assessments, the correct number of sounds, letters, or words is recorded and compared to a enquiry-established cut point (i.e., benchmark) to decide which students are not likely to exist successful in developing literacy skills without extra help. If a student scores below the benchmark, it indicates that the task was too hard, and detection of this difficulty can signal a need for intervention to forestall future bookish problems. Intervention typically involves more intensive means of teaching, such as extra instruction delivered to small groups of students.

To larn more nigh commercially available screenings such equally DIBELS Next and AIMSweb, or to acquire about how to create your own personalized screenings, delight visit http://interventioncentral.org. This site enables teachers to create their ain individualized screening probes to assess a variety of basic literacy skills, such as identifying messages and sounds, segmenting sounds in spoken words, sounding out nonsense words, reading real words in connected text, and filling in blanks in reading passages (called "maze" procedures). Teachers can select the letters, words, and passages to exist included on these individualized assessments. Probes to assess students' math and writing skills can also exist created; however, any customized screening probes should be used with circumspection, since they do not share the same measurement properties also-researched screenings such as DIBELS Side by side and AIMSweb.

Diagnostic Literacy Assessments

The purposes of universal literacy screenings can be assorted with those of diagnostic literacy assessments. Unlike literacy screeners, diagnostic tests are generally not administered to all students but are reserved for students whose learning needs continue to be unmet, despite their receiving intensive intervention. Diagnostic literacy assessments typically involve the use of standardized tests administered individually to students by highly trained educational specialists, such as reading teachers, special educators, speech and linguistic communication pathologists, and school psychologists. Diagnostic literacy assessments include subtests focusing on specific components of literacy, such as word recognition, decoding, reading comprehension, and both spoken and written linguistic communication. Results from diagnostic assessments may exist used formatively to assist plan more targeted interventions for students who do not appear to be responding adequately, or results can exist combined with those from other assessments to determine whether students may have an educational disability requiring special education services.

An example of a widely used diagnostic literacy test is the Wechsler Individual Accomplishment Exam-3rd Edition (WIAT-Iii; Wechsler, 2009). The WIAT-III is typically used to assess the achievement of students experiencing academic difficulties who have non responded to research-based interventions. The WIAT-Three includes reading, math, and language items administered according to the age of the pupil and his or her electric current skill level. The number of items the educatee gets correct (the raw score) is converted to a standard score, which is and then interpreted according to where the student'south score falls on a bong curve (see Effigy one) among other students the same age and grade level who took the same test (e.g., the normative or "norm" sample).

Effigy 1. Bong curve showing the percent of students who autumn above and beneath the average score of 100 on a diagnostic accomplishment test.

Virtually students will score in the middle of the distribution, but some students will achieve extreme scores—either higher or lower than about other students. This is why the "tails" at either side of the bell curve slope downward from the large hump in the eye—this illustrates the decreasing frequency of scores that are especially low or loftier. In other words, the more extreme the score, the fewer students are likely to attain it. When students attain at either extreme, it can signal the need for more specialized instruction related to the private needs of the pupil (east.g., intervention or gifted services).

Diagnostic achievement tests are frequently referred to as "norm-referenced" (edglossary.org, 2013) because their scores are compared to scores of students from a norm sample. A norm sample is a group of individuals who were administered the same test items in the same manner (i.east., using standardized procedures) while the test was beingness adult. Students who accept the test accept their performance compared to that of students from the norm sample to brand meaning of the score. For case, if a pupil were given a diagnostic assessment and the score fell inside the aforementioned range as most of the students in the norm sample, and then his or her score would be considered "average." If the educatee's score fell much higher or lower than other students in the norm sample, then the score would not be considered average or typical because well-nigh of the other students did not score at either of these extremes.

Comparing students' scores to a norm sample helps place strengths and needs. Then again, just knowing where students' scores fall on a bell curve does nothing to explain why they scored that way. An extremely depression score may indicate a learning trouble, or, it may point a lack of motivation on the part of the student while taking the exam. Peradventure a low score could even be due to a scoring error made by the tester. Even though a score from a diagnostic assessment may be quite precise, understanding why a student scored at a particular level requires additional data. Did observations during testing testify that the educatee was distracted, uncooperative, or was squinting at items? It is often a combination of assessment information that helps identify why a pupil may have scored a sure way and is why testers ofttimes use their observations during testing to interpret the significant of scores.

Group achievement tests such as The Iowa Test of Bones Skills (ITBS; Hoover Dunbar, & Frisbie, 2003) that include literacy subtests have properties that brand them office somewhat like a screening and somewhat like a diagnostic test. Similar screeners, they are administered to all students at a particular grade level, but unlike virtually screeners, they take more time to complete and are administered to entire classrooms rather than having at least some sections administered individually. Like diagnostic tests, they tend to produce scores that are norm-referenced. Students' functioning is compared to a norm group to see how they compare amongst peers, but unlike diagnostic tests, the tester is not able to discern how well scores stand for students' abilities because testers are non able to observe all of the students' testing behaviors that may affect the interpretation of scores (due east.g., levels of engagement, motivation).

For many diagnostic literacy tests, reviews are available through sources such equally the Mental Measurements Yearbook (MMY). Versions of the MMY are available in hard re-create at many libraries, also as online for complimentary for students at colleges and universities whose libraries pay a fee for access. Reviews are typically completed by experts in diverse fields, including literacy and measurement experts. Reviews besides include complete descriptions of the test or cess process, who publishes it, how long it takes to administer and score, a review of psychometric properties, and a critique of the test in reference to decisions people program to make based on findings. It is important for teachers and other educators who use tests to understand the benefits and problems associated with selecting 1 test over some other, and resources such equally the MMY offer reviews that are quick to locate, relatively piece of cake to comprehend (when one has some background knowledge in assessment), and are written past people who do not profit from the publication and sale of the cess.

Single Point Estimates

Literacy assessments that are completed but one fourth dimension provide a single point estimate of a student's abilities. An example of a single indicate estimate is a student'southward word identification score from a diagnostic achievement test. If the pupil's score is far beneath what is expected for his or her age or grade level, and then the score signals a need to determine what is at the root of low operation. Alternatively, a single depression score does not necessarily point a lack of ability to learn, since with a modify in instruction, the student might begin to progress much faster and somewhen catch upwards to his or her typical age-based peers. To assess a educatee's rate of learning, progress-monitoring assessments are needed.

Progress-Monitoring Literacy Assessments

To monitor a pupil's progress in literacy, assessments are needed that actually measure growth. Rather than just taking a snapshot of the student'southward achievement at a unmarried point in time, progress-monitoring assessments provide a baseline (i.e., the starting betoken) of a pupil's achievement, along with periodic reassessment equally he or she is progressing toward learning outcomes. Such outcomes might include achieving a benchmark score of correctly reading 52 words per minute on oral reading fluency passages or a goal of learning to "enquire and reply key details in a text" (CCSS.ELA-Literacy.RL.1.ii) when prompted, with 85% accurateness. The commencement upshot of correctly reading 52 words per minute would likely be measured using progress-monitoring assessments, such as DIBELS Side by side and AIMSweb. These screeners are non just designed to mensurate the extent to which students are at risk for future literacy-related problems at the outset of the school year simply also to monitor changes in progress over fourth dimension, sometimes as often as every one or two weeks, depending on individual student factors. The 2d event of beingness able to "enquire and answer key details in a text" could be monitored over time using assessments such equally state tests or responses on a qualitative reading inventory. Beingness able to work with fundamental details in a text could also be informally assessed by observing students engaged in classroom activities where this task is practiced.

Different assessments that are completed only one time, progress-monitoring assessments such as DIBELS Next and AIMSweb feature multiple, equivalent versions of the same tasks, such equally having 20 oral reading fluency passages that tin can be used for reassessments. Using unlike but equivalent passages prevents artificial increases in scores that would result from students rereading the same passage. Progress-monitoring assessments can be assorted with diagnostic assessments, which are non designed to be administered frequently. Administering the same subtests repeatedly would not be an effective way to monitor progress. Some diagnostic tests have two equivalent versions of subtests to monitor progress infrequently—perhaps on a yearly basis—simply they are simply not designed for frequent reassessments. This limitation of diagnostic assessments is one reason why screeners like DIBELS Next and AIMSweb are so useful for determining how students reply to intervention and why diagnostic tests are frequently reserved for making other educational decisions, such as whether a student may have an educational inability.

Progress-monitoring assessments take transformed how schools decide how a student is responding to intervention. For example, consider the hypothetical example of Jaime's progress-monitoring cess results in second form, shown in Effigy 2. Jaime was given oral reading fluency passages from a universal literacy screener, and then his progress was monitored to make up one's mind his response to a small group literacy intervention started in mid-October. Data points show the number of words Jaime read correctly on each of the one-minute reading passages. Notice how at the beginning of the school year, his baseline scores were extremely low, and when compared to the beginning of the twelvemonth 2d grade benchmark (Dynamic Measurement Group, 2010) of 52¹ words per minute (Good & Kaminski, 2011), they signaled he was "at risk" of non reaching later benchmarks without receiving intensive intervention. Based on Jaime's baseline scores, intervention team members decided that he should receive a research-based literacy intervention to aid him read words more easily so that his oral reading fluency would increment at least 1 word per calendar week. This learning goal is represented by the "target slope" seen in Figure 2. During the intervention phase, progress-monitoring data points prove that Jaime began making improvements toward this goal, and the line labeled "slope during intervention" shows that he was gaining at a rate slightly faster than his one word per week goal.

Ch 5 figure 2

Effigy 2. Progress-monitoring graph of response to a reading intervention.

When looking at Jaime's baseline data, notice how the data points form a plateau. If his progress continued at this same rate, past the end of the school yr, he would be fifty-fifty farther backside his peers and be at even greater risk for future reading problems. When interpreting the graph in Figure 2, it becomes articulate that intensive reading intervention was needed. Find after the intervention began how Jaime'due south growth began to climb steeply. Although he appeared to be responding positively to intervention, in reality, by the end of 2nd grade, students whose reading ability progresses adequately should be reading approximately 90 words correctly per minute (Good & Kaminski, 2011). Based on this information, Jaime is not likely to reach the level of reading ninety words correctly by the finish of second grade and will probably only reach the benchmark expected for a student at the beginning of second course. These assessment information suggest that Jaime's intervention should be intensified for the remainder of second grade to accelerate his progress farther. It is besides likely that Jaime will demand to continue receiving intervention into third grade, and progress monitoring can determine, along with other assessment data, when his oral reading fluency improves to the point where intervention may be inverse, reduced, or even discontinued. Y'all may wonder how the intervention team would determine whether Jaime is progressing at an adequate step when he is in third grade. Team members would continue to monitor Jaime's progress and bank check to make sure his growth line shows that he volition meet criterion at the end of 3rd form (i.e., correctly reading approximately 100 words per minute; Good & Kaminski, 2011). If his slope shows a lack of adequate progress, his teachers can revisit the need for intervention to ensure that Jaime does not fall behind again.

Some schools monitor their students' progress using computer-adapted assessments, which involve students responding to exam items delivered on a computer. Computer-adjusted assessments are designed to deliver specific examination items to students, and then adapt the number and difficulty of items administered according to how students respond (Mitchell, Truckenmiller, & Petscher, 2015). Computer-adapted assessments are increasing in popularity in schools, in part, because they do not crave a lot of time or effort to administer and score, only they do require schools to have an adequate technology infrastructure. The reasoning backside using these assessments is similar to other literacy screeners and progress-monitoring assessments—to provide constructive teaching and intervention to meet all students' needs (Mitchell et al., 2014).

Although many literacy screening and progress-monitoring assessment scores have been shown to exist well-correlated with a variety of measures of reading comprehension (come across, for case, Goffreda & DiPerna, 2010) and serve as reasonably good indicators of which students are at adventure for reading difficulties, a persistent trouble with these assessments is that they provide footling guidance to teachers about what kind of literacy instruction and/or intervention a student really needs. A educatee who scores low at baseline and makes inadequate progress on oral reading fluency tasks may need an intervention designed to increase reading fluency, just there is too a hazard that the pupil lacks the ability to decode words and really needs a decoding intervention (Murray, Munger, & Clonan, 2012). Or it could exist that the educatee does non know the meaning of many vocabulary words and needs to build groundwork knowledge to read fluently (Adams, 2010-2011), which would require the use of unlike cess procedures specifically designed to assess and monitor progress related to these skills. Even more vexing is when low oral reading fluency scores are caused by multiple, intermingling factors that need to be identified before intervention begins. When the problem is more circuitous, more specialized assessments are needed to disentangle the factors contributing to it.

A final note related to progress-monitoring procedures is the emergence of studies suggesting that at that place may exist better ways to measure students' progress on instruments such every bit DIBELS Next compared to using slope (Good, Powell-Smith, & Dewey, 2015), which was depicted in the example using Jaime's data. In a recent conference presentation, Good (2015) argued that the slope of a student'southward progress may exist too inconsistent to monitor and adjust didactics, and he suggested a new (and somewhat mathematically circuitous) alternative using an index called a student growth percentile. A student growth percentile compares the charge per unit at which a educatee'southward achievement is improving in reference to how other students with the aforementioned baseline score are improving. For example, a student reading x right words per minute on an oral reading fluency measure whose growth is at the fifth percentile is improving much more slowly compared to the other children who also started out reading only 10 words correctly per minute. In this case, a growth percentile of five means that the student is progressing merely as well as or ameliorate than five pct of peers who started at the aforementioned score, and also means that the current didactics is not coming together the pupil's needs. Preliminary research shows some promise in using growth percentiles to measure progress as an culling to gradient, and teachers should exist on the picket for more than research related to improving ways to monitor student progress.

Linking Assessment to Intervention

How can teachers figure out the details of what a student needs in terms of intervention? They would likely utilise a multifariousness of informal and formal assessment techniques to determine the educatee's strengths and needs. The situation might require the use of diagnostic assessments, a reading or writing inventory, the use of observations to determine whether the student is engaged during instruction, and/or the employ of assessments to amend understand the student's problem-solving and other thinking skills. It may be a combination of assessment techniques that are needed to match research-based interventions to the student's needs.

You may exist starting to recognize some overlap among unlike types of assessments across categories. For example, land tests are usually both formal and summative. Literacy screeners and progress-monitoring assessments are oft formal and formative. And some assessments, such as portfolio assessments, take many overlapping qualities across the various assessment categories (eastward.g., portfolios can be used formatively to guide education and used summatively to determine if students met an bookish outcome).

In bringing upward portfolio assessments, this takes us dorsum to points raised at the beginning of this affiliate related to the authenticity of literacy assessments. So why practise multiple choice tests exist if options such as portfolio assessment, which are so much more authentic, are an option? Loftier-quality multiple choice tests tend to have stronger psychometric properties (discussed in the adjacent section) than performance assessments like portfolios, which make multiple selection tests desirable when assessment time is limited and scores need to have strong measurement properties. Multiple choice test items are often piece of cake to score and do non require a bang-up deal of inference to interpret (i.e., they are "objective"), which are some of the reasons why they are popularly used. Portfolio assessments often take longer to exercise merely besides reflect the use of many of import literacy skills that multiple selection items only cannot assess. Based on this give-and-take, you may wonder if portfolio assessments are superior to multiple selection tests, or if the reverse is true. As always, an answer most a preferred format depends on the purpose of the assessment and what kinds of decisions will be fabricated based on findings.

Psychometric Principles of Literacy Assessment

A chapter about literacy assessment would non be complete without some discussion about psychometric properties of assessment scores, such as reliability and validity (Trochim, 2006). Reliable assessment means that the information gathered is consistent and dependable—that the same or similar results would exist obtained if the educatee were assessed on a different day, by a unlike person, or using a like version of the same assessment (Trochim, 2006). To remember virtually reliability in practice, imagine you lot were observing a pupil'south reading behaviors and determined that the educatee was struggling with paying attention to punctuation marks used in a storybook. Y'all rate the student's proficiency as being a one on a one to iv scale, pregnant he or she reads as though no punctuation marks were noticed. Your colleague observed the pupil reading the same book at the same time you were observing, and he rated the student's proficiency as a "three," meaning that the student was paying attending to most of the punctuation in the story, simply non all. The deviation betwixt your rating and your colleague's rating signals a lack of reliability among raters using that scale. If these same inconsistencies in ratings arose beyond other items on the reading behavior scale or with other students, yous would conclude that the calibration has problems. These problems could include that the calibration is poorly constructed, or that there may simply be inter-rater reliability bug related to a lack of training or experience with the people doing the ratings.

Reliability of formal assessment instruments, such as tests, inventories, or surveys, is usually investigated through research that is published in academic journal manufactures or examination manuals. This kind of research involves administering the musical instrument to a sample of individuals, and findings are reported based on how those individuals scored. These findings provide "estimates" of the test's reliability, since indexes of reliability volition vary to a certain degree, depending on the sample used in the research. The more stable reliability estimates are beyond multiple diverse samples, the more teachers can count on scores or ratings being reliable for their students. When reliability is unknown, then decisions made based on assessment information may not be trustworthy. The need for strong reliability versus the demand for authenticity (i.e., how well the assessment matches real life literacy situations) is a rivalry that underlies many testing debates.

In improver to assessments needing to exist reliable, information gathered from assessments must also be valid for making decisions. A test has evidence of validity when enquiry shows that it measures what it is supposed to measure (Trochim, 2006). For case, when a test that is supposed to identify students at risk for writing problems identifies students with actual writing issues, and then this is testify of the examination'south validity. A weekly spelling test score may lack bear witness of validity for applied spelling ability because some students may just be skillful memorizers and non be able to spell the same words accurately or employ the words in their writing. When assessment information is not reliable, then it cannot exist valid, so reliability is a keystone for the evaluation of assessments.

Sometimes, a test that seems to test what it is supposed to exam will have issues with validity that are not apparent. For example, if students are tested on math applications issues to run into who may need math intervention, a problem could arise if the children may not be able to read the words in the problems. In this case, the students may become many items wrong, making the math test more similar a reading exam for these students. Information technology is research on validity and observations by astute educators that help uncover these sorts of problems and prevent the delivery of a math intervention when what may really be needed is a reading intervention.

The validity upshot described to a higher place is one reason why some students may receive accommodations (e.m., reading a examination to students) because accommodations tin can actually increase the validity of a test score for certain students. If students with reading disabilities had the above math test read to them, then their resulting scores would likely be a truer indicator of math power because the accommodation ruled out their reading difficulties. This aforementioned logic applies to English language learners (ELLs) who can understand spoken English language much better than they tin can read information technology. If a high school test assessing noesis of biology is administered and ELL students are unable to pass information technology, is it because they do not know biological science or is it considering they do non know how to read English? If the goal is to assess their noesis of biology, then the test scores may not be valid.

Some other example of a validity event occurs if a educatee with visual impairment were assessed using a reading chore featuring impress in 12-signal font. If the student scored poorly, would you refer him or her for reading intervention? Hopefully, non. The student might actually need reading intervention, but in that location is a validity trouble with the assessment results, and then that in reality, you would demand more information before making any decisions. Consider that when you reassess the student'southward reading using large print that the educatee's score increases dramatically. You lot then know that it was a print size trouble and not a reading problem that impacted the educatee'southward initial score. On the other hand, if the student still scored low even with appropriately enlarged print, you would conclude that the educatee may accept a visual impairment and a reading problem, in which instance providing reading intervention, along with the accommodation of large print material, would exist needed.

Some Controversies in Literacy Assessment

While there is little controversy surrounding literacy assessments that are breezy and function of normal classroom practices, formal assessments activate huge controversy in schools, in research communities, on Net discussion boards, and in textbooks like this. When considering the scope of educational assessment, one matter is clear: many schoolhouse districts give far too many tests to far too many students and waste far too many hours of instruction gathering information that may or may non prove to have any value (Nelson, 2013). The over testing problem is especially problematic when and so much time and effort go into gathering information that practice non even end upward being used. Whether a schoolhouse is overwhelmed with testing is not universal. Schoolhouse districts have a groovy deal of influence over the use of assessments, only all as well often when new assessments are adopted, they are added to a collection of previously adopted assessments, and the commune becomes unsure about which assessments are even so needed and which should be eliminated. Assessments as well are added based on policy changes at federal and state levels. For case, the passing of the No Kid Left Behind Human action of 2001 (NCLB, 2002) expanded state testing to occur in all grades three through eight, compared to previous mandates which were much less stringent.

Some tests are mandated for schools to receive funding, such as land tests; however, the use of other assessments is largely up to school districts. It is important for educators and schoolhouse leaders to periodically inventory procedures being used, discuss the extent to which they are needed, and make decisions that will provide answers without over testing students. In other words, the validity of assessments is not only limited to how they are used with individual students but must be evaluated at a larger system level in which benefits to the whole educatee torso are too considered. When assessments provide data that are helpful in making instructional decisions just also take away weeks of instructional time, educators and school leaders must piece of work toward solutions that maximize the value of assessments while minimizing potential negative effects. Non liking test findings is a different issue than examination findings not existence valid. For example if a test designed to identify students behind in reading is used to alter instruction, so it may be quite valuable, fifty-fifty if it is unpleasant to find out that many students are having difficulty.

As a society, we tend to desire indicators of student accountability, such as that a minimum standard has been met for students to earn a loftier schoolhouse diploma. Frequently, earning a diploma requires students to pass high-stakes exit exams; however, this seemingly straightforward use of examination scores can easily lead to social injustice, particularly for students from culturally and linguistically various backgrounds. Because high-stakes tests may be inadequate at providing complete information almost what many students know and can practise, the International Reading Association (IRA, 2014) released a position statement that included the following recommendation:

Loftier school graduation decisions must exist based on a more complete picture of a student'southward literacy performance, obtained from a variety of systematic assessments, including breezy observations, determinative assessments of schoolwork, and consideration of out-of-school literacies, too as results on standardized formal measures. (p. 2)

The IRA recommends that "instructor professional judgment, results from formative assessments, and student and family input, as well equally results from standardized literacy assessments" (p. 5) serve equally adequate additions in making graduation decisions. There is no like shooting fish in a barrel reply for how to utilise assessments to precisely communicate how well students are prepared for college, careers, and life, and we are likely many reform movements away from designing a suitable programme. Nevertheless, the more educators, families, and policy-makers know about assessments—including the inherent benefits and problems that accompany their use—the more progress tin can exist made in refining techniques to make informed decisions designed to heighten students' futures. Literacy assessments can only be used to improve outcomes for students if educators have deep knowledge of research-based instruction, assessment, and intervention and can employ that cognition in their classrooms. For this reason, data from this chapter should be combined with other capacity from this book and other texts outlining the utilize of constructive literacy strategies, including students who are at take chances for developing reading problems or who are English language learners.

Summary

Although literacy assessment is often associated with high-stakes standardized tests, in reality, literacy assessments cover an assortment of procedures to aid teachers brand instructional decisions. This affiliate highlighted how teachers can use literacy assessments to ameliorate instruction, simply in reality, assessment results are frequently used to communicate nigh literacy with a variety of individuals, including teams of educators, specialists, and family unit and/or community members. Knowing about the dissimilar kinds of assessments and their purposes will allow you to be a valuable addition to these important conversations.

Literacy assessments can be breezy or formal, formative or summative, screenings or diagnostic tests. They can provide data at single points in fourth dimension or to monitor progress over fourth dimension. Regardless of their intended purpose, it is of import that assessment information be trustworthy. It is also of import that teachers who use assessments understand associated benefits and difficulties of different procedures. An assessment that is platonic for utilize in one circumstance may be inappropriate in another. For this reason, teachers who have background in assessment volition be better equipped to select appropriate assessments which accept the potential to benefit their students, and they also will be able to critique the utilize of assessments in ways that can ameliorate assessment practices that are more than organization-broad. Literacy assessments are an important office of educational decision making, and therefore, it is essential that teachers gain a thorough understanding of their uses and misuses, gain experience interpreting information obtained through assessment, and actively participate in reform movements designed not just to eliminate testing merely to utilise assessments in thoughtful and meaningful ways.

Questions and Activities

Using some of the terms learned from this affiliate, hash out some commonly used high-stakes literacy assessments, such equally state-mandated tests or other tests used in schools.
Explain ways in which some forms of literacy assessment are more than controversial than others and how the more controversial assessments are impacting teachers, students, and the education system.
What are the differences between formative and summative assessments? List some examples of each and how yous currently apply, or plan to use these assessments in your educational activity.
A colleague of yours decides that she would similar to use a diagnostic literacy exam to assess all students in her heart school to see who has reading, spelling, and/or writing issues. The test must be administered individually and volition take approximately 45 minutes per student. Although there is simply i form of the assessment, your colleague would similar to administrate the test three times per year. Subsequently listening advisedly to your colleague's ideas, what other ideas do yous have that might assistance encounter your colleague's goal too the use of a diagnostic literacy test?

References

Adams, M. J. (2010-2011, Winter). Advancing our students' language and literacy: The challenge of complex texts. American Educator, 34, 3-xi, 53. Retrieved from http://www.aft.org/sites/default/files/periodicals/Adams.pdf

Afflerbach, P., & Cho, B. Y. (2011). The classroom assessment of reading. In M. J. Kamil, P. D. Pearson, E. B. Moje, & P. P. Afflerbach (Eds.), Handbook of reading research (Vol. iv, pp. 487-514). New York, NY: Routledge.

Dynamic Measurement Group (2010, December one). DIBELS Next criterion goals and composite scores. Retrieved from https://dibels.uoregon.edu/docs/DIBELSNextFormerBenchmarkGoals.pdf

Edglossary (2013, August 29). Norm-referenced exam [online]. Retrieved from http://edglossary.org/norm-referenced-test/

Edglossary (2014, April 30). Criterion-referenced exam [online]. Retrieved from http://edglossary.org/criterion-referenced-test/

Goffreda, C. T., & DiPerna, J. C. (2010). An empirical review of psychometric bear witness for the Dynamic Indicators of Bones Early on Literacy Skills. Schoolhouse Psychology Review, 39, 463-483. Available at http://www.nasponline.org/publications/periodicals/spr/volume-39/book-39-issue-iii/an-empirical-review-of-psychometric-show-for-the-dynamic-indicators-of-basic-early-literacy-skills

Good, R. H. (2015, May nineteen). Improving the efficiency and effectiveness of instruction with progress monitoring and formative evaluation in the outcomes driven model. Invited presentation at the International Conference on Cognitive and Neurocognitive Aspects of Learning: Abilities and Disabilities, Haifa, Israel. Retrieved from https://dibels.org/papers/Roland_Good_Haifa_Israel_2015_Handout.pdf

Proficient, R. H., & Kaminski, R. A. (Eds.). (2011). DIBELS Adjacent assessment transmission . Eugene, OG: Dynamic Measurement Group, Inc. Retrieved from http://www.d11.org/edss/assessment/DIBELS%20NextAmplify%20Resources/DIBELSNext_AssessmentManual.pdf

Practiced, R. H., Powell-Smith, Thou. A., & Dewey, E. (2015, Feb). Making r eliable and s tabular array p rogress decisions: Slope or pathways of p rogress ? Affiche presented at the Annual Pacific Coast Enquiry Briefing, Coronado, CA.

Hoover, H. D., Dunbar, South. B., & Frisbie, D. A. (2003). The Iowa Tests: Guide to research and development. Chicago, IL: Riverside Publishing.

International Reading Clan. (2014). Using high-stakes assessments for grade retention and graduation decisions: A position statement of the International Reading Association.Retrieved from http://world wide web.literacyworldwide.org/docs/default-source/where-nosotros-stand/loftier-stakes-assessments-position-statement.pdf

Leslie, L., & Caldwell, J. Southward. (2010). Qualitative reading inventory-5. Boston, MA: Pearson.

Mitchell, A. 1000., Truckenmiller, A., & Petscher, Y. (2015, June). Computer-adjusted assessments: Fundamentals and considerations. Communique, 43(viii), ane, 22-24.

Murray, Thou. S., Munger, Grand. A., & Clonan, S. M. (2012). Cess equally a strategy to increase oral reading fluency. Intervention in Schools and Clinic, 4 7, 144-151. doi:10.1177/1053451211423812

National Governors Clan Center for All-time Practices & Quango of Chief Country School Officers. (2010). Common Cadre State Standards for English Language Arts & Literacy in History/Social Studies, Science, and Technical Subjects . Washington, DC: Author. Retrieved from http://www.corestandards.org/avails/CCSSI_ELA%20Standards.pdf

Nelson, H. (2013). Test ing more, teaching less: What American'due south obsession with student testing costs in money and lost instructional fourth dimension . Retrieved from http://world wide web.aft.org/sites/default/files/news/testingmore2013.pdf

No Child Left Backside Human action of 2001, Pub. L. No. 107-110, 115 Stat. 1425 (2002).

Pearson. (2012). AIMS web technical manual (R-CBM and TEL). NCS Pearson, Inc. Retrieved from http://www.aimsweb.com/wp-content/uploads/aimsweb-Technical-Manual.pdf

Snowfall, C. (Chair). (2002). RAND reading study group: Reading for understanding, toward an R&D programme in reading comprehension. Santa Monica, CA: RAND. Retrieved from http://world wide web.rand.org/content/dam/rand/pubs/monograph_reports/2005/MR1465.pdf

Trochim, Due west. K. (2006). Research methods noesis base : Construct validity. Retrieved from http://www.socialresearchmethods.net/kb/relandval.php

Wechsler, D. (2009). Wechsler Private Achievement Test(3rd ed.). San Antonio, TX: Pearson.

Photograph Credit

Image in Figure one by Wikimedia, CCBY-SA 3.0 https://upload.wikimedia.org/wikipedia/commons/iii/39/IQ_distribution.svg

Endnotes

1: The benchmark of 52 words per infinitesimal is considered a "criterion-referenced" score because a student'southward performance is judged against a benchmark—in this case, the benchmark. Recollect that scores obtained on diagnostic literacy assessments are norm-referenced considering they are judged against how others in a norm group scored. Some progress-monitoring assessments provide both criterion-referenced and norm-referenced scores to help in controlling when more than than one type of score is needed. Render

torrespagesing.blogspot.com

Source: https://courses.lumenlearning.com/literacypractice/chapter/5-types-of-literacy-assessment-principles-procedures-and-applications/