|
|
|
Sunshine
State TESOL
Journal
Volume 6, Number 1
Spring 2007
Understanding the
CELLA:
ESOL Educators’
Perspectives
Candace
Harper, Lauren Gibson, YaYu Ho, Karla LaCayo, and Jiao Li
University of Florida
Gainesville,
Florida
Abstract
Title III of No Child Left Behind (NCLB, 2002)
requires schools to demonstrate that English language learners are
making
continuous progress in English language development. The
Comprehensive English Language Learning Assessment (CELLA, 2006) is a
new
language proficiency test designed to measure the growth in English
that ELLs
need to succeed in school and that schools must document in order to
meet NCLB
program accountability objectives. In Fall 2006, a group of
graduate
students in a Florida
teacher education course on ESOL Testing decided to conduct a study to
learn
ESL educators’ perceptions of the CELLA following its first statewide
administration. Interview questions and
analysis were framed around two important properties of formal language
assessment: validity and reliability. Although teachers were generally
positive, they identified issues of language, culture, and test format
that
could affect the validity of results, and they identified reliability
issues
related to test administration and scoring.
Introduction
The stated goal of
the No Child Left Behind slation
(NCLB, 2002) is to eliminate the achievement gap for minority and
disadvantaged students. English language learners (ELLs) are
specifically
identified as a subgroup that has fallen behind academically. Under
Title I of
NCLB, K-12 schools, districts, and states are held accountable for an
annual
increase in academic achievement for ELLs. Title III of NCLB
establishes the
additional requirement to demonstrate that ELLs are making continuous
progress
in English language development. The NCLB accountability expectations
for
academic achievement are measured through standardized tests such as
the
Florida Comprehensive Achievement Test (FCAT). In order to meet the
NCLB
accountability objective of increasing the
English
proficiency of ELLs, the Florida Department of Education (FDOE) entered
into a
consortium of five states (2) and
contracted with Educational
Testing Service (ETS) to develop and implement the Comprehensive
English Language Learning Assessment (CELLA).
The CELLA is a language proficiency test
designed to measure
the English language growth that ELLs need to succeed in school (FDOE,
2007).
The CELLA is expected to provide data for schools and districts to
chart
student progress over time and allow them to meet the NCLB program
accountability objectives. The CELLA tests four language skills
(listening,
speaking, reading, and writing) separately in each of the four test
levels:
Level A (Grades K-2), Level B (Grades 3-5), Level C (Grades 6-8), and
Level D
(Grades 9-12). The Listening, Reading,
and Writing sections of the test are administered to students in
groups; the
Speaking section of the test is administered as individual interviews
with
students.
Due to the
significance of this new test, and because it is still relatively
unknown to
Florida’s ESOL profession, in Fall 2006 a group of students in a
teacher
education course on ESOL Testing decided that they wanted to learn more
about
the CELLA. Toward this end, they designed a study and interviewed a
small
sample of ELL students and educators in order to learn their
perceptions on the
CELLA following its first statewide administration. Interview questions
were
framed around two important properties of formal language assessments
that had
been emphasized in the course: reliability and validity (Hughes, 1989).
Background
Validity
refers to the accuracy of an
assessment in measuring what is intended; reliability refers to the consistency of an assessment in its
measurement. Both validity and reliability are important
characteristics of
standardized tests, and their importance increases as the stakes rise
for test
takers. The accountability requirements of NCLB have raised the stakes
considerably, particularly with regard to ELLs, for whom any test in English is a test of English. Abedi
(2002; 2004), Crawford
(2004), Garcia (1994), Kuhlman (2005), and Wright (2005) are among
those who
have raised specific concerns regarding validity and reliability in
assessing
the language development and content learning of ELLs, citing
linguistic and
cultural bias in tests and failure to include ELLs in test norming
populations.
Bias can exist in both
cultural and linguistic
dimensions (Cummins, 2001). Culture influences how we behave, what we
believe,
what we value, how we socialize, and how we make sense of our
experiences. Lack
of awareness of cultural differences can easily lead to incorrect
assumptions
and unfair assessments. For example, a test item may assume the
specific
knowledge and experience of a particular cultural group and therefore
benefit
test takers from that group while putting those from other groups at a
disadvantage.
Another possible source
of bias lies in the language
of a test. Linguistic bias can occur when the language used on a test
is more
familiar to some groups, not through differences in students’
individual
abilities but rather through a systematic or experiential advantage by
some
students. For example, a timed achievement test in history that makes
frequent
use of references in another language—such as French—that only a few
students
in the class can speak has clear language bias. Linguistic bias, like
cultural
bias, can interfere with the validity of an assessment, and other
linguistic
issues, such as linguistic complexity or unfamiliarity, can also
undermine the
validity of a test.
Methodology
Given the
challenges inherent in large-scale assessment in general and the
complexity of
assessing ELLs in particular, members of the class began to prepare for
the
interviews through background reading on language proficiency
assessment and by
examining other ESL proficiency tests currently used in Florida (e.g.,
Idea
Proficiency Test, Language Assessment Scales, Language Assessment
Battery). The
class members then decided to limit the interview pool to four
districts (two
in south Florida and two in central Florida) representing large and
small urban
and rural communities with growing ELL student populations. And, to
better
understand the CELLA from informed practitioner perspectives, they
decided to
interview only ESOL teachers who had been directly involved in
administering
the CELLA in September. The students identified 12 ESOL teachers (two
in each
of three districts, and six in the fourth district) who agreed to be
interviewed. The class decided to include ESOL student perspectives in
the
study, and focus group interviews were conducted with 16 ESOL secondary
students from two of the districts. This paper will focus on ESOL
teacher
perspectives, with ESOL students’ voices used primarily as a means of
triangulation in the data analysis.
The student
researchers and the instructor conducted the face-to face interviews.
Interviews lasted from 45 to 60 minutes, were recorded, and were later
transcribed for analysis. Class members were paired as research teams
and each
interview was assigned to two teams for independent analysis. Research
teams
first identified primary conceptual categories in the data (Miles &
Huberman, 1984), conferred with classmates working on the same
interviews,
refined their analyses, and then presented their findings to the class
for discussion.
There the coding categories were expanded and collapsed to fit the
larger data
set. Themes that emerged from the class-level analyses of the teacher
interviews included cultural and linguistic issues, format,
administration, and
scoring of the test. Presentation of the findings is organized
according to
educators’ views on how these themes may have affected the validity
(culture,
language, and format issues) and reliability (administration and
scoring) of
the CELLA in measuring students’ English language proficiency.
Findings
Threats to Validity
Validity refers to
the accuracy of an assessment in measuring what is intended. Teachers
reported
several concerns with the CELLA that could have produced less valid
results for
their students. These concerns related to issues of culture, language,
and
format of the test.
Culture issues
Teachers interviewed in
this study identified one
item in the CELLA that they felt might unfairly draw on cultural
knowledge ELLs
might not possess. They reported a question in the Speaking section of
the test
asking for students’ opinions on the optimal age for young people to
begin
dating. (3) Although
dating is common among teenagers in the United States and other
developed
countries, it is not the norm for young people in all cultural groups.
In some
countries, adolescents are not allowed to date. One teacher commented,
“I’m not
saying we have students this year from the Middle
East,
but we did, and it’s not even an issue for a woman. They can’t date in
their
country.”
Language issues
The most frequently
cited example of linguistic bias
in the CELLA referred to an item on the Speaking section in which
students are
shown two parallel lines and asked to name their geometric property.
Because
the word parallel has a Spanish
cognate (paralelo), ELLs who speak
Spanish are likely to find this item easier than ELLs from other
language
backgrounds. One ESOL teacher stated: “Spanish speakers know parallel and perpendicular because they
are essentially the same as Spanish.”
Another teacher
objected to the same item but for a
different reason. She believed that this item was problematic because
the word parallel may be used passively but not
actively by students in real school settings. She stated:
.
. . if they’re in their math class, if it’s
in their passive vocabulary, when the teacher is at the blackboard
saying,
‘these are parallel lines…’ when do students really use these words?
They might
never become their part of their active vocabulary. So they might know
what it
is. And if I say, ‘which one is the parallel line?’ they might know
that, but
for them to come out with the word parallel
. . . Did that show they know English? No!
Another upper
elementary ESOL teacher felt that the language required in response to
some of
the questions on the CELLA was unfamiliar to her students at the
discourse
level. She gave the examples of students being directed to ask their
teacher
questions to which they already knew the answers, and to repeat
questions
rather than answer them as inauthentic language functions. Another
teacher
commented on the unfamiliar use of language required by an item on the
Writing
section of the CELLA:
…they had to respond
to
pictures. It seems like a simple task, but it took forever to do that
part
because you would have to stop and say, ‘Everybody stop and make a
sentence
about this picture.’ They could not get it, that they were supposed to
make/write a sentence about this picture because they’re used to having
to
write more than just a sentence…
A secondary teacher also
criticized the
authenticity of the discourse of one of the Speaking tasks requiring
her
students to ask a hypothetical guest speaker at their school (a
detective)
about the ethical aspects of her job (rather
than something they might realistically
ask, such as what kinds of cases she worked on and if she carried a
gun).
Another
secondary teacher expressed concern that responding to a recorded
prompt on the Listening section of the CELLA was
not an authentic task for her
students: “a group of people, sitting in the room, listening to a tape
or CD,
it’s never going to
happen in their real lives.”
In each of the
examples provided by these teachers, the language sample elicited by
the CELLA
may not have reflected students’ true proficiency in English. As a
result, the
validity of the test results may have been compromised. In spite of
their
concerns, however, most of the teachers had positive things to say
about the
CELLA. Related specifically to content validity of the test, two
teachers commented
on the fact that, more than other English proficiency tests used in the
past,
the CELLA focused on academic settings that reflected the kind of
school-based
language students needed to develop.
Format issues: ETS goes to elementary school
One
of the most prominent themes in the interview data suggesting that the
validity
of the CELLA test scores may have been threatened was the format of the
test.
Several elementary teachers described problems their students had in
matching
questions to the appropriate places in the answer booklet. Students
became
confused about what section they should be working in and where to mark
their
answers. One of the teachers explained:
There were
areas where there was a little sample box;
they looked like college exam answer sheets and they really did not
look like
anything that any third grader should be asked to use. We would go over
and
over again, ‘This is the section that we are working on. Everyone put
your
finger on the reading section.’ And I would go around the room and make
sure
that they all had their finger on the reading section. ‘Now look at
sample A.’
And we would discover that the answer to sample A was b, and they would
find
sample B and bubble in over there! I had a couple of kids who filled
out a
whole section in the wrong area. It was very confusing.
Two elementary
ESOL teachers explained that the bubbles on the answer sheets were too
small
and difficult for the students to shade in. One said:
Third graders
had a test booklet and an answer sheet
booklet, and they were required to bubble in on the answer sheet
booklet and
the bubbles were small. These kids are what? Eight! It was very
difficult. I
was very frustrated, and so were the kids. It made the test
administration much
longer than it needed to take.
The elementary
teachers expressed concern over the length of the CELLA, particularly
in the Reading
and Writing
sections of the test, which caused some of their students to cry from
fatigue
and frustration. Another aspect related to test format that students
found to
be problematic was the response mode in which they had to mark a cross
on the
correct answer. An elementary teacher explained:
I think they
had to cross off the one that was the
correct answer and that confused them because that is not what we
typically do.
It’s the reverse of what we typically do in those instructions. Usually
you get
something that is your choice and you circle it, but they were asked to
put an
‘X’ across it. It was hard to convince them that yes, indeed, I wanted
them to
cross off the correct answer.
Two elementary
teachers also commented that unclear instructions likely interfered
with their
students’ performance on the test. One teacher stated, “I feel like for
us to
get the best out of them, a lot is lost in not understanding
directions.” If
students must struggle to understand test directions, record their
answers, and
simply complete the test, the validity of the test may be compromised.
In spite of
concerns with formatting issues, test length, and unclear instructions
complicating their students’ ability to demonstrate their English
proficiency,
the elementary ESOL teachers seemed generally satisfied with the CELLA,
especially with the tasks in the Writing section. The secondary
teachers were
more critical, however, and particularly with respect to the language
issues
identified in the test.
Threats to Reliability
Reliability
refers to the consistency of the test in eliciting and measuring a
sample.
Teachers reported several concerns with the CELLA that could have
produced
unreliable results for their students. These concerns were related to
the test
administration and scoring.
Test
administration issues
Variability in
test administration conditions and procedures was the most widely
reported
source of inconsistency. These differences existed across all four
districts. (4) For
example, most of the teachers reported that the CELLA was administered
to
students in their own classrooms, where they were assisted by their
teachers
who read the Listening test section aloud, encouraged and prompted them
on the
Speaking section, and provided additional time as needed on the Reading and
Writing
sections. However, one teacher confided that students in her middle
school were
“herded into the cafeteria where the AP barked test directions at them
with a
bullhorn.” They sat in confusion as the audio-recorded Listening test
was
played through an inadequate public address system. Those unfortunate
students
received neither the guidance nor the time needed to do their best on
the test.
Teachers who had
personally administered the CELLA in smaller groups of students felt
strongly
that this attention had been critical to their students’ success in
completing
the test. One teacher stated, “I felt like I had to, or I was going to
have mistakes
just filling it out.” Interviews with ELL students confirmed teachers’
suspicions regarding the effects these different test administration
conditions
can have on students’ levels of confidence and motivation to do well.
One
student credited the support of the ESOL classroom with his feelings of
success
on the test, “…because of the teacher and classmates and everything.
We’re more
friendly here.” Teachers also had strong feelings about the advantage
for
students in hearing their own teacher’s voice and familiar accent over
the
unfamiliar recording. One teacher noted, “No, I would never [use a tape
recorder]. The kids have to look at our lips. They have to look at our
intonation.”
Clearly small
group test settings supervised by teachers who know the students and
can attend
to their individual needs are preferable to large, impersonal group
administrations. Nevertheless, one secondary teacher spelled out the
cost in
both student and teacher terms. “Because of the CELLA, I lost eight
full days
of instructional time with my lowest quartile of kids. And that’s not
all—they
got pulled for Reading Intervention assessments too.”
At the secondary
level, two teachers expressed concern that because the same test was
administered at different times, students had talked among themselves
and
shared information about the test with others who had not yet taken it.
One
teacher confirmed this, “You hear kids asking another one in Spanish,
‘What’s
the word for __?’” This type of breach in test security also occurs on
make-up
days for the FCAT, though some of the relatively discrete units on the
CELLA
(e.g., key vocabulary such as parallel)
can be remembered easily and passed on to others thus information
reported for
the CELLA may have a bigger impact on test outcome.
All of the
teachers interviewed reported that their district or school had
provided some
“minimal” orientation to administering and scoring the Speaking section
of the
CELLA, though teachers differed in their assessments of its
effectiveness.
Elementary teachers were more satisfied with the level of preparation
they had
received than were the secondary teachers, with the following exception:
Yeah, I don’t feel
like
I was terribly well prepared. I think that the orientation to it was
very
surface overview except for the practice of the scoring which was
important. We
also did look at some of the listening activities, but that was less
significant really. Going through items and scoring is not as important
as the
actual administration. I was really not prepared for that whole [test
format]
problem with the third graders. I just had no clue that that was how it
was
going to be, but I’m not sure the people conducting the workshop were
aware of
that either.
Scoring
issues
The test
administrators were also responsible for scoring the Speaking portion
of the
CELLA. Although the elementary teachers claimed that the orientation
and the
scoring rubrics helped them to reliably score the Speaking portion of
the test,
this was not the case at the secondary level. Two secondary teachers
disagreed
with the CELLA scoring system and rubric, claiming
that it overlooked
students’ grammar mistakes and allowed them to receive full credit for
a
response if they knew the vocabulary. One explained:
I looked [at
the scoring criteria] and we listened to the
examples and we disagreed what the scoring should have been. The
scoring
guidelines allowed students to make errors that we thought, absolutely,
would
not receive full points. If the student performs that way in our
classes, he/she
would not get an ‘A.’
Another
secondary
teacher complained that the CELLA scoring criteria did not consider the
sociocultural context of a response. For example, several items on the
Speaking
section of the test asked students to perform different functions with
a friend
(such as asking to borrow some money). But students who asked “You got
a dollar
I can borrow?” could not receive credit for their response, which
needed to
conform to a more standard form for questions.Two teachers expressed
concern
over a lack of consistency in the actual scoring of the CELLA Speaking
tests
due to differences in test administrators, possibly affecting the
(inter-rater)
reliability of the test. They also mentioned variables that could
affect
(intra-rater) reliability, such as a teacher evaluating students
differently
due to empathy or dislike for a particular student, or a teacher’s
potential
desire for students to show stronger learning gains at the end of the
year.
Although both supported teachers having input to the assessment
process, they
were concerned that teachers’ personal relationships with students or
their
professional desire to appear successful could prevent their impartial
assessment of their own students.
One teacher noted
that in administering the CELLA, an ESOL professional would understand
what it
means to “prompt” a student (prompting is allowed), but she warned,
“someone
who’s never had contact with non-native speakers might just talk louder
or keep
repeating.” This teacher explained that she would likely prompt her
advanced
students less actively than her beginners (implying that this was
unfair), and
that she would have predetermined ideas about these students’
proficiency
levels. As a result, she might not be completely objective in this role
whereas
impartial school personnel would be able to administer the test
according to
CELLA guidelines. ELL student responses to the question of whether they
were
actually prompted or helped by their teacher/test administrator were
mixed.
In summary, there were
significant differences in the administration and
scoring conditions for the CELLA in Fall 2006. It is not currently
possible to
calculate the effect size of this variability on the reliability of
CELLA
scores. However, as the stakes rise for successful performance on the
CELLA it
will become increasingly important to acknowledge this variability and
to
prepare additional school personnel thoroughly to administer and score
the test
in a consistent manner.
Discussion
The research
reported here was conducted in order to better understand the CELLA
through the
perspectives of ESOL educators. Overall, these teachers felt that the
CELLA was
a valid test, with some potentially serious threats to its reliability.
Along
with their concerns, teachers provided recommendations based on their
initial
experiences administering the CELLA:
- evaluate the
CELLA psychometrically for cultural and linguistic bias with Florida
ELL students and revise any problematic items,
- shorten the Reading and Writing sections of the test, and/or recommend
that they be administered on separate days,
- revise the
test administration timetable to be more detailed and realistic,
- reformat the
answer booklet by providing a one-page answer sheet for
each test section and increasing the size of the answer blanks and
bubbles for younger students,
- standardize
the training, administration, and scoring procedures,
- allow
districts to replace current ESOL placement tests with the CELLA, and
- allow
districts to replace the FCAT Reading and Writing tests with the CELLA
for low proficiency ELLs.
In addition to educators’ concerns related to
the test
itself, their questions regarding possible uses for CELLA scores are
worth
mentioning here. The CELLA has been promoted by ETS as a source of
information
on individual students’ English proficiency that can be used as a basis
for
ESOL program entry, exit, and placement decisions and as a source of
diagnostic
information to inform classroom instruction. However, results
from the
Fall 2006 CELLA administration were not released to schools until
January 2007,
so CELLA scores could not be used for program entry, exit, or
diagnostic
purposes this year. Although the first year of administration for any
new
standardized assessment is complicated by the need for procedures to be
implemented and official cut scores established, in future years ETS
and the
FDOE will need to ensure that CELLA test results are available to
schools much
more quickly. Indeed, the entire process of administration, scoring,
and
reporting will need to be streamlined in order to meet the real-time
needs of
schools and serve the programmatic, diagnostic, and instructional
purposes for
which the CELLA has been promised.
Conclusion
Finally, as our
teacher colleagues and their students struggle to meet the NCLB annual
yearly
progress (AYP) targets for academic achievement on the Spring 2007
FCAT, we
hope that similar high-stakes conditions will not be imposed on ELLs’
performance on the CELLA. In addition to its accountability mission,
the CELLA
has been marketed and sold as an assessment tool to measure, document, and facilitate the development of
English language proficiency for ELLs. If the CELLA can be used for the
multiple purposes of entry/exit and placement into ESOL programs and
diagnosis
for targeted instruction, and if CELLA scores can replace FCAT Reading
and
Writing scores for ELLs with very limited English proficiency, the
CELLA will
indeed make an important contribution to identifying and serving
students’
English language development needs. Like the ESOL educators we
interviewed, we
are cautiously optimistic. Yet we worry that in the quest for
accountability
the CELLA may ultimately prove to be a Trojan Horse—a vehicle for the
introduction of yet another layer of standardized testing with
unrealistic
expectations and performance pressures for ELLs and their teachers who
are
currently, in the words of one teacher, being “tested to death.”
References
Abedi,
J.
(2002). Assessment and accommodations of English language learners:
Issues,
concerns, and recommendations. Journal of
School Improvement, 3(1), 83-89.
Abedi,
J.
(2004). The No Child Left Behind Act and English language learners:
Assessment
and accountability issues. Educational
Researcher, 33, 4-14.
Brown,
H.
D. (2004). Language assessment:
Principles and classroom practices. White Plains, NY:
Pearson
Education.
Crawford,
J. (September, 2004). No Child Left
Behind: Misguided approach to school accountability
for English language
learners. Forum on Ideas to Improve the NCLB Accountability
Provisions for
Students with Disabilities and English Language Learners. Center on
Education
Policy.
Florida
Department of Education
(2007). Florida-Comprehensive English Language Learning Assessment.
Retrieved
March 10, 2007, from http://www.firn.edu/doe/aala/cella.htm
Garcia, E. (1994). Understanding and meeting the
challenge of
student cultural diversity. Boston, MA:
Houghton-Mifflin.
Hughes,
A. (1989). Testing for language teachers. Cambridge University
Press.
Kuhlman,
N. (2005,
March/April). The language assessment conundrum: What tests claim to
assess and
what teachers
need to know. The ELL
Outlook. Retrieved July 8, 2006, from http://www.coursecrafters.com/ELL-Outlook/2005/mar_apr/ELLOutlookITIArticle1.htm
Miles,
M. B., & Huberman, A. M. (1984). Qualitative
data analysis: A sourcebook of new methods. Beverly Hills,
CA:
Sage.
Wright, W. E. (2005). English
language learners left behind in Arizona:
The nullification of
accommodations in
the intersection of federal and state policies. Bilingual
Research Journal, 29.
Author Bios
|
|