Sunshine State TESOL Journal





Educational Resources









Penguin Readers









Sunshine State TESOL Journal

Volume 6, Number 1
Spring 2007


 

Understanding the CELLA:

ESOL Educators’ Perspectives

Candace Harper, Lauren Gibson, YaYu Ho, Karla LaCayo, and Jiao Li

University of Florida

Gainesville, Florida

 

Abstract

Title III of No Child Left Behind (NCLB, 2002) requires schools to demonstrate that English language learners are making continuous progress in English language development. The Comprehensive English Language Learning Assessment (CELLA, 2006) is a new language proficiency test designed to measure the growth in English that ELLs need to succeed in school and that schools must document in order to meet NCLB program accountability objectives. In Fall 2006, a group of graduate students in a Florida teacher education course on ESOL Testing decided to conduct a study to learn ESL educators’ perceptions of the CELLA following its first statewide administration.  Interview questions and analysis were framed around two important properties of formal language assessment: validity and reliability. Although teachers were generally positive, they identified issues of language, culture, and test format that could affect the validity of results, and they identified reliability issues related to test administration and scoring.
 

Introduction

The stated goal of the No Child Left Behind (1) legislation (NCLB, 2002) is to eliminate the achievement gap for minority and disadvantaged students. English language learners (ELLs) are specifically identified as a subgroup that has fallen behind academically. Under Title I of NCLB, K-12 schools, districts, and states are held accountable for an annual increase in academic achievement for ELLs. Title III of NCLB establishes the additional requirement to demonstrate that ELLs are making continuous progress in English language development. The NCLB accountability expectations for academic achievement are measured through standardized tests such as the Florida Comprehensive Achievement Test (FCAT). In order to meet the NCLB accountability objective of increasing the English proficiency of ELLs, the Florida Department of Education (FDOE) entered into a consortium of five states (2) and contracted with Educational Testing Service (ETS) to develop and implement the Comprehensive English Language Learning Assessment (CELLA).

The CELLA is a language proficiency test designed to measure the English language growth that ELLs need to succeed in school (FDOE, 2007). The CELLA is expected to provide data for schools and districts to chart student progress over time and allow them to meet the NCLB program accountability objectives. The CELLA tests four language skills (listening, speaking, reading, and writing) separately in each of the four test levels: Level A (Grades K-2), Level B (Grades 3-5), Level C (Grades 6-8), and Level D (Grades 9-12). The Listening, Reading, and Writing sections of the test are administered to students in groups; the Speaking section of the test is administered as individual interviews with students.

Due to the significance of this new test, and because it is still relatively unknown to Florida’s ESOL profession, in Fall 2006 a group of students in a teacher education course on ESOL Testing decided that they wanted to learn more about the CELLA. Toward this end, they designed a study and interviewed a small sample of ELL students and educators in order to learn their perceptions on the CELLA following its first statewide administration. Interview questions were framed around two important properties of formal language assessments that had been emphasized in the course: reliability and validity (Hughes, 1989).

Background

            Validity refers to the accuracy of an assessment in measuring what is intended; reliability refers to the consistency of an assessment in its measurement. Both validity and reliability are important characteristics of standardized tests, and their importance increases as the stakes rise for test takers. The accountability requirements of NCLB have raised the stakes considerably, particularly with regard to ELLs, for whom any test in English is a test of English. Abedi (2002; 2004), Crawford (2004), Garcia (1994), Kuhlman (2005), and Wright (2005) are among those who have raised specific concerns regarding validity and reliability in assessing the language development and content learning of ELLs, citing linguistic and cultural bias in tests and failure to include ELLs in test norming populations.

Bias can exist in both cultural and linguistic dimensions (Cummins, 2001). Culture influences how we behave, what we believe, what we value, how we socialize, and how we make sense of our experiences. Lack of awareness of cultural differences can easily lead to incorrect assumptions and unfair assessments. For example, a test item may assume the specific knowledge and experience of a particular cultural group and therefore benefit test takers from that group while putting those from other groups at a disadvantage.

Another possible source of bias lies in the language of a test. Linguistic bias can occur when the language used on a test is more familiar to some groups, not through differences in students’ individual abilities but rather through a systematic or experiential advantage by some students. For example, a timed achievement test in history that makes frequent use of references in another language—such as French—that only a few students in the class can speak has clear language bias. Linguistic bias, like cultural bias, can interfere with the validity of an assessment, and other linguistic issues, such as linguistic complexity or unfamiliarity, can also undermine the validity of a test.


Methodology

Given the challenges inherent in large-scale assessment in general and the complexity of assessing ELLs in particular, members of the class began to prepare for the interviews through background reading on language proficiency assessment and by examining other ESL proficiency tests currently used in Florida (e.g., Idea Proficiency Test, Language Assessment Scales, Language Assessment Battery). The class members then decided to limit the interview pool to four districts (two in south Florida and two in central Florida) representing large and small urban and rural communities with growing ELL student populations. And, to better understand the CELLA from informed practitioner perspectives, they decided to interview only ESOL teachers who had been directly involved in administering the CELLA in September. The students identified 12 ESOL teachers (two in each of three districts, and six in the fourth district) who agreed to be interviewed. The class decided to include ESOL student perspectives in the study, and focus group interviews were conducted with 16 ESOL secondary students from two of the districts. This paper will focus on ESOL teacher perspectives, with ESOL students’ voices used primarily as a means of triangulation in the data analysis.

The student researchers and the instructor conducted the face-to face interviews. Interviews lasted from 45 to 60 minutes, were recorded, and were later transcribed for analysis. Class members were paired as research teams and each interview was assigned to two teams for independent analysis. Research teams first identified primary conceptual categories in the data (Miles & Huberman, 1984), conferred with classmates working on the same interviews, refined their analyses, and then presented their findings to the class for discussion. There the coding categories were expanded and collapsed to fit the larger data set. Themes that emerged from the class-level analyses of the teacher interviews included cultural and linguistic issues, format, administration, and scoring of the test. Presentation of the findings is organized according to educators’ views on how these themes may have affected the validity (culture, language, and format issues) and reliability (administration and scoring) of the CELLA in measuring students’ English language proficiency.
 

Findings

Threats to Validity

Validity refers to the accuracy of an assessment in measuring what is intended. Teachers reported several concerns with the CELLA that could have produced less valid results for their students. These concerns related to issues of culture, language, and format of the test.


Culture issues

Teachers interviewed in this study identified one item in the CELLA that they felt might unfairly draw on cultural knowledge ELLs might not possess. They reported a question in the Speaking section of the test asking for students’ opinions on the optimal age for young people to begin dating. (3) Although dating is common among teenagers in the United States and other developed countries, it is not the norm for young people in all cultural groups. In some countries, adolescents are not allowed to date. One teacher commented, “I’m not saying we have students this year from the Middle East, but we did, and it’s not even an issue for a woman. They can’t date in their country.”

 

Language issues

The most frequently cited example of linguistic bias in the CELLA referred to an item on the Speaking section in which students are shown two parallel lines and asked to name their geometric property. Because the word parallel has a Spanish cognate (paralelo), ELLs who speak Spanish are likely to find this item easier than ELLs from other language backgrounds. One ESOL teacher stated: “Spanish speakers know parallel and perpendicular because they are essentially the same as Spanish.”

Another teacher objected to the same item but for a different reason. She believed that this item was problematic because the word parallel may be used passively but not actively by students in real school settings. She stated:

 . . . if they’re in their math class, if it’s in their passive vocabulary, when the teacher is at the blackboard saying, ‘these are parallel lines…’ when do students really use these words? They might never become their part of their active vocabulary. So they might know what it is. And if I say, ‘which one is the parallel line?’ they might know that, but for them to come out with the word parallel . . . Did that show they know English? No!

Another upper elementary ESOL teacher felt that the language required in response to some of the questions on the CELLA was unfamiliar to her students at the discourse level. She gave the examples of students being directed to ask their teacher questions to which they already knew the answers, and to repeat questions rather than answer them as inauthentic language functions. Another teacher commented on the unfamiliar use of language required by an item on the Writing section of the CELLA:

…they had to respond to pictures. It seems like a simple task, but it took forever to do that part because you would have to stop and say, ‘Everybody stop and make a sentence about this picture.’ They could not get it, that they were supposed to make/write a sentence about this picture because they’re used to having to write more than just a sentence…

A secondary teacher also criticized the authenticity of the discourse of one of the Speaking tasks requiring

her students to ask a hypothetical guest speaker at their school (a detective) about the ethical aspects of her job (rather

than something they might realistically ask, such as what kinds of cases she worked on and if she carried a gun). Another

secondary teacher expressed concern that responding to a recorded prompt on the Listening section of the CELLA was

not an authentic task for her students: “a group of people, sitting in the room, listening to a tape or CD, it’s never going to

happen in their real lives.”

In each of the examples provided by these teachers, the language sample elicited by the CELLA may not have reflected students’ true proficiency in English. As a result, the validity of the test results may have been compromised. In spite of their concerns, however, most of the teachers had positive things to say about the CELLA. Related specifically to content validity of the test, two teachers commented on the fact that, more than other English proficiency tests used in the past, the CELLA focused on academic settings that reflected the kind of school-based language students needed to develop.


Format issues: ETS goes to elementary school

            One of the most prominent themes in the interview data suggesting that the validity of the CELLA test scores may have been threatened was the format of the test. Several elementary teachers described problems their students had in matching questions to the appropriate places in the answer booklet. Students became confused about what section they should be working in and where to mark their answers. One of the teachers explained:

There were areas where there was a little sample box; they looked like college exam answer sheets and they really did not look like anything that any third grader should be asked to use. We would go over and over again, ‘This is the section that we are working on. Everyone put your finger on the reading section.’ And I would go around the room and make sure that they all had their finger on the reading section. ‘Now look at sample A.’ And we would discover that the answer to sample A was b, and they would find sample B and bubble in over there! I had a couple of kids who filled out a whole section in the wrong area. It was very confusing.

Two elementary ESOL teachers explained that the bubbles on the answer sheets were too small and difficult for the students to shade in. One said:

Third graders had a test booklet and an answer sheet booklet, and they were required to bubble in on the answer sheet booklet and the bubbles were small. These kids are what? Eight! It was very difficult. I was very frustrated, and so were the kids. It made the test administration much longer than it needed to take.

The elementary teachers expressed concern over the length of the CELLA, particularly in the Reading and Writing sections of the test, which caused some of their students to cry from fatigue and frustration. Another aspect related to test format that students found to be problematic was the response mode in which they had to mark a cross on the correct answer. An elementary teacher explained:

I think they had to cross off the one that was the correct answer and that confused them because that is not what we typically do. It’s the reverse of what we typically do in those instructions. Usually you get something that is your choice and you circle it, but they were asked to put an ‘X’ across it. It was hard to convince them that yes, indeed, I wanted them to cross off the correct answer.

Two elementary teachers also commented that unclear instructions likely interfered with their students’ performance on the test. One teacher stated, “I feel like for us to get the best out of them, a lot is lost in not understanding directions.” If students must struggle to understand test directions, record their answers, and simply complete the test, the validity of the test may be compromised.

In spite of concerns with formatting issues, test length, and unclear instructions complicating their students’ ability to demonstrate their English proficiency, the elementary ESOL teachers seemed generally satisfied with the CELLA, especially with the tasks in the Writing section. The secondary teachers were more critical, however, and particularly with respect to the language issues identified in the test.


Threats to Reliability

Reliability refers to the consistency of the test in eliciting and measuring a sample. Teachers reported several concerns with the CELLA that could have produced unreliable results for their students. These concerns were related to the test administration and scoring.


Test administration issues

Variability in test administration conditions and procedures was the most widely reported source of inconsistency. These differences existed across all four districts. (4) For example, most of the teachers reported that the CELLA was administered to students in their own classrooms, where they were assisted by their teachers who read the Listening test section aloud, encouraged and prompted them on the Speaking section, and provided additional time as needed on the Reading and Writing sections. However, one teacher confided that students in her middle school were “herded into the cafeteria where the AP barked test directions at them with a bullhorn.” They sat in confusion as the audio-recorded Listening test was played through an inadequate public address system. Those unfortunate students received neither the guidance nor the time needed to do their best on the test.

Teachers who had personally administered the CELLA in smaller groups of students felt strongly that this attention had been critical to their students’ success in completing the test. One teacher stated, “I felt like I had to, or I was going to have mistakes just filling it out.” Interviews with ELL students confirmed teachers’ suspicions regarding the effects these different test administration conditions can have on students’ levels of confidence and motivation to do well. One student credited the support of the ESOL classroom with his feelings of success on the test, “…because of the teacher and classmates and everything. We’re more friendly here.” Teachers also had strong feelings about the advantage for students in hearing their own teacher’s voice and familiar accent over the unfamiliar recording. One teacher noted, “No, I would never [use a tape recorder]. The kids have to look at our lips. They have to look at our intonation.”

Clearly small group test settings supervised by teachers who know the students and can attend to their individual needs are preferable to large, impersonal group administrations. Nevertheless, one secondary teacher spelled out the cost in both student and teacher terms. “Because of the CELLA, I lost eight full days of instructional time with my lowest quartile of kids. And that’s not all—they got pulled for Reading Intervention assessments too.”

At the secondary level, two teachers expressed concern that because the same test was administered at different times, students had talked among themselves and shared information about the test with others who had not yet taken it. One teacher confirmed this, “You hear kids asking another one in Spanish, ‘What’s the word for __?’” This type of breach in test security also occurs on make-up days for the FCAT, though some of the relatively discrete units on the CELLA (e.g., key vocabulary such as parallel) can be remembered easily and passed on to others thus information reported for the CELLA may have a bigger impact on test outcome.

All of the teachers interviewed reported that their district or school had provided some “minimal” orientation to administering and scoring the Speaking section of the CELLA, though teachers differed in their assessments of its effectiveness. Elementary teachers were more satisfied with the level of preparation they had received than were the secondary teachers, with the following exception: 

Yeah, I don’t feel like I was terribly well prepared. I think that the orientation to it was very surface overview except for the practice of the scoring which was important. We also did look at some of the listening activities, but that was less significant really. Going through items and scoring is not as important as the actual administration. I was really not prepared for that whole [test format] problem with the third graders. I just had no clue that that was how it was going to be, but I’m not sure the people conducting the workshop were aware of that either.


Scoring issues

The test administrators were also responsible for scoring the Speaking portion of the CELLA. Although the elementary teachers claimed that the orientation and the scoring rubrics helped them to reliably score the Speaking portion of the test, this was not the case at the secondary level. Two secondary teachers disagreed with the CELLA scoring system and rubric, claiming that it overlooked students’ grammar mistakes and allowed them to receive full credit for a response if they knew the vocabulary. One explained: 

I looked [at the scoring criteria] and we listened to the examples and we disagreed what the scoring should have been. The scoring guidelines allowed students to make errors that we thought, absolutely, would not receive full points. If the student performs that way in our classes, he/she would not get an ‘A.’

 Another secondary teacher complained that the CELLA scoring criteria did not consider the sociocultural context of a response. For example, several items on the Speaking section of the test asked students to perform different functions with a friend (such as asking to borrow some money). But students who asked “You got a dollar I can borrow?” could not receive credit for their response, which needed to conform to a more standard form for questions.Two teachers expressed concern over a lack of consistency in the actual scoring of the CELLA Speaking tests due to differences in test administrators, possibly affecting the (inter-rater) reliability of the test. They also mentioned variables that could affect (intra-rater) reliability, such as a teacher evaluating students differently due to empathy or dislike for a particular student, or a teacher’s potential desire for students to show stronger learning gains at the end of the year. Although both supported teachers having input to the assessment process, they were concerned that teachers’ personal relationships with students or their professional desire to appear successful could prevent their impartial assessment of their own students.

One teacher noted that in administering the CELLA, an ESOL professional would understand what it means to “prompt” a student (prompting is allowed), but she warned, “someone who’s never had contact with non-native speakers might just talk louder or keep repeating.” This teacher explained that she would likely prompt her advanced students less actively than her beginners (implying that this was unfair), and that she would have predetermined ideas about these students’ proficiency levels. As a result, she might not be completely objective in this role whereas impartial school personnel would be able to administer the test according to CELLA guidelines. ELL student responses to the question of whether they were actually prompted or helped by their teacher/test administrator were mixed.

In summary, there were significant differences in the administration and scoring conditions for the CELLA in Fall 2006. It is not currently possible to calculate the effect size of this variability on the reliability of CELLA scores. However, as the stakes rise for successful performance on the CELLA it will become increasingly important to acknowledge this variability and to prepare additional school personnel thoroughly to administer and score the test in a consistent manner.
 

Discussion

The research reported here was conducted in order to better understand the CELLA through the perspectives of ESOL educators. Overall, these teachers felt that the CELLA was a valid test, with some potentially serious threats to its reliability. Along with their concerns, teachers provided recommendations based on their initial experiences administering the CELLA:

  • evaluate the CELLA psychometrically for cultural and linguistic bias with Florida ELL students and revise any problematic items,
  • shorten the Reading and Writing sections of the test, and/or recommend that they be administered on separate days,
  • revise the test administration timetable to be more detailed and realistic,
  • reformat the answer booklet by providing a one-page answer sheet for each test section and increasing the size of the answer blanks and bubbles for younger students,
  • standardize the training, administration, and scoring procedures,
  • allow districts to replace current ESOL placement tests with the CELLA, and
  • allow districts to replace the FCAT Reading and Writing tests with the CELLA for low proficiency ELLs.

 

In addition to educators’ concerns related to the test itself, their questions regarding possible uses for CELLA scores are worth mentioning here. The CELLA has been promoted by ETS as a source of information on individual students’ English proficiency that can be used as a basis for ESOL program entry, exit, and placement decisions and as a source of diagnostic information to inform classroom instruction. However, results from the Fall 2006 CELLA administration were not released to schools until January 2007, so CELLA scores could not be used for program entry, exit, or diagnostic purposes this year. Although the first year of administration for any new standardized assessment is complicated by the need for procedures to be implemented and official cut scores established, in future years ETS and the FDOE will need to ensure that CELLA test results are available to schools much more quickly. Indeed, the entire process of administration, scoring, and reporting will need to be streamlined in order to meet the real-time needs of schools and serve the programmatic, diagnostic, and instructional purposes for which the CELLA has been promised. 


Conclusion

Finally, as our teacher colleagues and their students struggle to meet the NCLB annual yearly progress (AYP) targets for academic achievement on the Spring 2007 FCAT, we hope that similar high-stakes conditions will not be imposed on ELLs’ performance on the CELLA. In addition to its accountability mission, the CELLA has been marketed and sold as an assessment tool to measure, document, and facilitate the development of English language proficiency for ELLs. If the CELLA can be used for the multiple purposes of entry/exit and placement into ESOL programs and diagnosis for targeted instruction, and if CELLA scores can replace FCAT Reading and Writing scores for ELLs with very limited English proficiency, the CELLA will indeed make an important contribution to identifying and serving students’ English language development needs. Like the ESOL educators we interviewed, we are cautiously optimistic. Yet we worry that in the quest for accountability the CELLA may ultimately prove to be a Trojan Horse—a vehicle for the introduction of yet another layer of standardized testing with unrealistic expectations and performance pressures for ELLs and their teachers who are currently, in the words of one teacher, being “tested to death.”

 

References

Abedi, J. (2002). Assessment and accommodations of English language learners: Issues, concerns, and recommendations. Journal of School Improvement, 3(1), 83-89.

Abedi, J. (2004). The No Child Left Behind Act and English language learners: Assessment and accountability issues. Educational Researcher, 33, 4-14.

Brown, H. D. (2004). Language assessment: Principles and classroom practices. White Plains, NY: Pearson
Education.

Crawford, J. (September, 2004). No Child Left Behind: Misguided approach to school accountability
for English language learners
. Forum on Ideas to Improve the NCLB Accountability Provisions for Students with Disabilities and English Language Learners. Center on Education Policy.

Florida Department of Education (2007). Florida-Comprehensive English Language Learning Assessment. Retrieved
March 10, 2007, from http://www.firn.edu/doe/aala/cella.htm

Garcia, E. (1994). Understanding and meeting the challenge of student cultural diversity. Boston, MA:

      Houghton-Mifflin.

Hughes, A. (1989). Testing for language teachers. Cambridge University Press.

Kuhlman, N. (2005, March/April). The language assessment conundrum: What tests claim to assess and what teachers
need to know. The ELL Outlook. Retrieved July 8, 2006, from http://www.coursecrafters.com/ELL-Outlook/2005/mar_apr/ELLOutlookITIArticle1.htm

Miles, M. B., & Huberman, A. M. (1984). Qualitative data analysis: A sourcebook of new methods. Beverly Hills,
     CA
: Sage.

Wright, W. E. (2005). English language learners left behind in Arizona: The nullification of

      accommodations in the intersection of federal and state policies. Bilingual Research Journal, 29.

 

Author Bios

Lauren Gibson, YaYu Ho, Karla LaCayo, and Jiao Li are graduate students specializing in ESOL/Bilingual Education in the College of Education at the University of Florida in Gainesville. Candace Harper is the faculty coordinator of the program.

1. See 115 STAT. 1425 LIC LAW 107–110—January 8, 2002, found at http://www.ed.gov/policy/elsec/leg/esea02/107-110.pdf

2. These states are Florida, Pennsylvania, Maryland, Michigan, and Tennessee (AALA Presentation, December 2006)

3. This item and other specific items from the CELLA cited here have been altered to protect the security of the test. The changes retain relevant characteristics of the item.

4. Districts used non-instructional personnel as well as teachers to administer and score the tests.




Sunshine State TESOL Journal
ISSN 1934-7030
Copyright rests with authors