Methods and Criteria of Evaluation in CALL

1. Introduction

One of the main reasons for evaluating language learning is to consider the validity of the content and method, the extent to which the methodology generates reliable results, and the practicality of the method for a given context and learners. Similarly, evaluation in the field of CALL (Computer Assisted Language Learning) is concerned with the effectiveness of the CALL materials. Typically, any CALL material can be considered representative of a pedagogy or methodology and should be held to the same standards as other pedagogies and methodologies.

This essay discusses evaluation in CALL. First, definitions of CALL evaluation are given. Then, the object of evaluation in CALL and the types of evaluators are described in section 3 and 4. The fifth section is mainly concerned with the approaches used for CALL evaluation. After that, a judgemental evaluation of a CALL lesson designed by the researcher is carried out utilising Chapelle’s (2001) evaluation framework. Finally, the conclusion summarizes the main points raised in the essay.

2. Definitions of Evaluation in CALL

There are several definitions of evaluation in CALL. For instance, Johnson (1992: 192) stated that “The purpose of an evaluation study is to assess the quality, effectiveness or general value of a program or other entity”. Krathwohl (1993: 524) maintained that evaluation is decision driven and argued that its main goal is to make a decision on “the worth of something”. Evaluation offers a method by which the evaluator can reach an informed, logical and convincing judgement or decision on the worth of a certain practice, a design feature or a particular approach.

Levy and Stockwell (2006: 42) discussed the characteristics of evaluation studies and concluded that they have a pragmatic outcome, and are primarily carried out to determine the worth of something and report the results to a defined audience.

In fact, evaluation is a process used by CALL developers to improve their materials or by users to assess the effectiveness of a CALL program or task. CALL software evaluations are usually carried out by journal reviewers, teachers, and institutions or even by a learner evaluating for possible use or purchase. When considering implementing CALL materials, the effectiveness of CALL is of critical importance.

3. The Object of Evaluation in CALL

The object of evaluation in CALL is dependent on what the evaluator wants to evaluate. CALL materials that are usually under scrutiny include websites, online courses, CALL tutors, authoring systems and tools, computer mediated communication tools, learning management systems, tasks, activities, CDs, search engines for databases and archives, corpus concordancing tools, and generic software such as word processors, presentation software and web browsers.

It is worth noting that some of the above mentioned CALL objects (e.g. websites) may pose a problem for evaluators because they contain various activities which offer different types of interaction and include a wide range of tools and resources. Thus, when evaluating such materials, the evaluator has to be cautious and admit that each CALL element has to be tested against the relevant evaluation criteria.

In addition, a CALL material can be either content-specific or content-free (Hubbard, 1987; Levy, 1997). If it is content-specific, the user cannot edit the linguistic material or the format of the activities which seek to teach that content. Multimedia software distributed on CDs is content-specific because it is impossible to edit it. However, if the CALL application is content-free, then the user can edit or add new information to the content which the software then uses as data for the pre-programmed activities. Accordingly, evaluation studies differ in terms of the CALL object being assessed. They might focus either on the language content or the qualities and dimensions of the interactions among learners.

Finally, CALL evaluation is also concerned with pedagogical and methodological issues as well as learner strategies and autonomous learning. It is worth mentioning that CALL evaluation studies may also investigate the effect of using CALL on other variables (and vice versa) such as attitudes and motivation, learning styles, and anxiety.

4. Types of Evaluators

There are two main types of evaluators: designer-evaluator and third-party evaluators Levy and Stockwell (2006). CALL materials are usually designed by language teachers, either alone or in collaboration with CALL experts such as multimedia designers or computer programmers. In addition to their contribution to the creation and development of CALL materials, language teachers are also expected to be involved in the evaluation process of such materials. In this way language teachers are both the designers and evaluators of CALL materials. They are familiar with the material in question and know it thoroughly. They also know the idiosyncratic attributes of the learners and the context in which the material under evaluation will be implemented. The designers-evaluators know the objectives of their CALL materials. When they initiate the process of evaluation, they know precisely what questions to ask. Thus, they can develop their own specific framework to answer the questions in which they are interested.

Third-party evaluations on the other hand, are conducted by an evaluator who has not made any contribution in the creation of the object of evaluation. This type of evaluations may include language teachers and learners evaluating new CALL packages. The journal reviews of CALL materials published by CALL experts are also classified as third-part evaluations. Third party-evaluators have to apply the appropriate evaluation criteria to be able to judge the applicability of CALL materials in different contexts and with different learners. The language teacher here has an advantage over any other third-party evaluator because he/she knows the idiosyncrasies of the context and the learners using the materials in question. A software evaluator for example, does not know the individual differences among the potential users/learners of the CALL materials and his/her evaluation is mainly based on his/her previous experience. In fact, because third-party evaluators are not involved in the creation and development of the CALL materials, it complicates their mission, particularly when they try to determine the objectives and the potential of the materials. Moreover, the CALL materials reviewer does not also know the characteristics of the intended learners. However, third-part evaluations are very useful and helpful for language teachers when forced to compare available CALL packages or when they are expected to carry out their own CALL evaluations.

5. Approaches to Evaluation in CALL

CALL practitioners use a variety of methodologies to evaluate their CALL materials. These methodologies range from the simple checklist or survey to the more multifaceted approaches such as those developed by Hubbard (1987, 1988, 1992, and 1996) and Chapelle (2001). In this section I will look at these separately.

5.1 Checklists and Surveys

One of the most widely used instruments for CALL evaluation is the checklist. Susser (2001: 262) defined the checklist as consisting “of a series of questions or statements to be checked off ‘yes/no’ or marked 1-5 on a likert scale, or has blanks to be filled in”. Susser (2001: 262) further explained that “A checklist may be in questionnaire format or accompanied by lengthy text explanations: a series of questions in a paragraph form also qualifies, as does a bare of features to be looked at”.

The checklist is usually divided into sections which include separate lists of related questions or statements. Although the categories are clearly classified and the questions are clearly expressed in the checklist, there is usually no specific procedure for answering the questions or informing the evaluator how to conduct if some answers were negative and others were positive. In fact, there is no methodology accompanying the checklist that might instruct the evaluator how to reach an informed judgement when both strengths and weaknesses are identified. Utilising checklists to evaluate CALL materials is criticised for their inaccuracy, incompatibility and non-transferability because they only offer a limited response set such as ‘yes/no’ questions or ‘likert scales’ (Decco, 1984). Susser (2001) rejected this by arguing that there is no principle which prevents evaluators from providing extra space for commentary. He maintained that several checklists contain both Likertscales and open-ended questions (e.g. Hubbard, 1987).

Checklists were also criticised for their concentration on technology aspects more than teaching and learning issues, but as Susser (2001) argued, most CALL evaluation checklists have dealt with teaching and learning aspects as well. Susser (2001) also realised that the reliability and validity of checklists need experimental evidence to overcome the methodological deficiencies. Moreover, he detected the bias towards a specific approach or method by several CALL practitioners and argued that this provides a consistent and principled basis for thechecklist.

Finally, the researcher believes that checklists are very useful for language teachers as reminders of the various CALL aspects which require assessment. Although checklists assist teachers in the selection of appropriate CALL materials, Susser (2001) insisted that teachers should monitor the use of CALL in the classroom. The second most common form of evaluation in CALL is the survey. It is a useful instrument used to gather data about the reactions of learners and teachers to CALL materials. It may involve observing and questioning learners and interviewing teachers to reveal the strengths and limitations of a certain CALL material. Surveys are used to collect data about new technologies, applicability and functionality of CALL packages. They are also employed to assess learners’ attitudes and perceptions towards CALL materials, to obtain feedback or to investigate students’ views about certain features of distant learning. Checklists and surveys are often employed by language teachers to evaluate their own created CALL materials, a situation where they know exactly the idiosyncrasies of their syllabus and approach as well as having an intimate knowledge of the individual differences and needs among their students.

Although surveys collect valuable data to assist in the improvement of CALL materials, CALL experts prefer a more multifaceted approach. Accordingly, Hémard and Cushion (2001: 20) concluded that surveys “could be appropriately used in conjunction with others to provide further data on students’ level of ICT competence, their views on the CALL interface and how they accessed it, if at all, within their own learning context”. Other data collection techniques might include peer evaluation and verbal or think-aloud protocols. These evaluation methods are “designed to focus the learnability and usability of a system” (Hémard and Cushion, 2001: 21).

5.2 The CALICO Journal Criteria

CALL evaluations are also found in reviews published in leading CALL journals. CALL journals usually prescribe their own template or framework for evaluating CALL materials. Evaluators have to conform to the journal’s evaluation framework if they want their work to be published. For instance, Burston (2003) noticed that the CALICO Journal evaluation framework examines the ‘critical systematic properties’  which CALL materials claim to achieve. Burston (2003:35) argued that any CALL material must exhibit pedagogical validity, curriculum adaptability, efficiency, effectiveness, and pedagogical innovation. Burston (2003) maintained that four categories derived from Hubbard’s framework are utilised to generate the software evaluation template of the journal. These categories deal with the reliability, functionality, ease of use, and the nature and design of the CALL activities as well as ‘teacher fit’ and ‘learner fit’. He considered ‘teacher fit’ to be the most critical element of software evaluation and the most difficult to evaluate. With regard to ‘learner fit’, Burston (2003:39) argued that teachers should determine the degree to which ‘the program is appropriate for, or can be adapted to, the needs of their students’.

5.3 Hubbard’s CALL Evaluation Framework

In his early work, Hubbard (1987) recognised the important value of checklists and highlighted the significance of developing evaluation checklists which are specifically prepared for use in second language learning. Hubbard (1987: 229-230) criticised the available checklists and emphasised that the evaluator has to clearly understand ‘the approach underlying the curriculum and syllabus’ and the fit of the software to instructional approach’. Hubbard (1987) developed an evaluation instrument for language teaching and learning accompanied by a procedure explaining how it should be used. His evaluation instrument was divided into three components: an approach checklist, a learner strategy checklist, and other pedagogical considerations. The two checklist sections were in the form of statements which require a response on a Likert-like scale and extra space for open-ended
comments. With regard to the procedure, Hubbard (1987: 249) stated that the evaluation process moves ‘from a cursory level to a very detailed review’. He also pointed out that the evaluation process might be terminated whenever the evaluator has collected enough conclusive data indicating that the software is not appropriate for his/her class or program.

Hubbard improved his framework in his successive works (1988, 1992, and 1996) and proposed a framework comprising three modules which cover development, evaluation and implementation. Hubbard (1992, 1996) summarised the principles upon which his framework was based. He maintained that the framework should be consistent with other frameworks for language teaching methodology, and that it should not be confined to a sole conception of language, language teaching, or language learning. The framework, moreover, has to overtly connect development, evaluation, and implementation. Finally, it should identify the components of language teaching/learning process and the various interrelationships among them.

Building on the previous frameworks developed by Philips (1985b) for CALL and Richards and Roger’s (1986) account of a language teaching method, Hubbard (1996) developed his CALL evaluation framework. It included three levels of analysis: ‘teacher fit’, ‘learner fit’, and ‘operational design’. Hubbard (1996: 28) argued that when the evaluator has finished the analysis of teacher and learner fit for a certain CALL material, the data collected can be used to reach “appropriateness judgements” and to prepare “implementation schemes” (see Figure 1). The former assesses the suitability of the CALL material to a particular learning context and learners; the latter is concerned with determining when and how the material in question will be used with learners.

(Hubbard's 1988 Framework of evaluation in CALL

5.4 Chapelle’s CALL Evaluation Framework

Chapelle (2001: 51-52) argues that evaluation criteria ought to integrate findings and theoretical assumptions about ideal conditions for SLA. She maintained that criteria should include explanations as to how they should be utilized. Moreover, Chapelle (2001: 51-52) argued that ‘both criteria and theory need to apply not only to software, but also to the task that the teacher plans and the learner carries out’. Taking into account these considerations, Chapelle (2001: 52) structures five principles for evaluating CALL summarised in table 1 below.

summary of principles of evaluation in CALL

Chapelle (2001: 52) argued that ‘CALL task appropriateness need to be evaluated on the basis of evidence and rationales pertaining to task use in a particular setting’. Chapelle (2001) also maintained that CALL software analysis is decontextualised and she recommended checklists for this purpose, but analysis of how the teacher plans and organises his/her class is context-specific as well as collecting and analysing learners’ performance.

Since the evaluation is a complex issue it involves not only the researchers but every one that uses the CALL software such as learners and teachers (Chapelle, 2001: 53). Chapelle (2001: 53) proposed different levels of analysis for CALL evaluation. The first level of analysis concerns the evaluation of the used CALL software. In the second level the teacher’s performance and involvement is taken into consideration. The teacher is very much involved in how the CALL is used in class, as well as how it is introduced and structured. As pointed out by (Jones, 1986) in (Chapelle, 2001), “It’s not so much the program, as what you do with it”. The third level focuses on the learner’s performance. It highlights how the empirical data reflects the way the learner uses CALL.

As Chapelle (2001: 54) notes, evaluation can be done judgmentally at the level of initial selection, based on how appropriate a program appears to be according to criteria drawn from research on SLA, and it can also be done empirically, based on data from actual learner use. Both methods provide complementary information appropriate for CALL task evaluation (Chapelle, 2001: 54). Based on theory and research on conditions for instructed SLA, Chapelle (2001: 55) proposed six criteria for evaluating CALL Tasks. Table (2) presents these criteria.

evaluation in CALL

Language learning potential is the central criterion in CALL evaluations. It is the extent to which the task fosters constructive focus on form. Learner fit considers linguistic and non-linguistic individual differences. Meaning focus requires that the design of the task makes the learner concentrates more on meaning. Authenticity is concerned with the link between in-class and out-of-class activities in relation to learners’ interests and needs. Finally, practicality relates to the resources available and it may include hardware, software and technical support.

Furthermore, Chapelle (2001: 59, 68) offers certain questions for the above criteria that can be used as a guide for the judgemental and empirical evaluations of CALL tasks. The researcher will utilise these questions to judgementally evaluate a CALL lesson in the next section.

6. Evaluation of a CALL Lesson Using Chapelle’s (2001) Questions

Chapelle’s (2001: 59) proposed questions for conducting a judgemental analysis of CALL materials are presented in table (3) below.

proposed questions for conducting a judgemental analysis of CALL materials

The CALL material under evaluation is a stand-alone grammar lesson aiming at teaching the present simple tense with the functional language of daily routines, likes and dislikes plus adverbs of frequency and other time expressions for elementary and/or pr-intermediate English learners as well as practicing listening and writing skills by watching videos (downloaded from YouTube) and doing the designated homework. It teaches grammar directly utilising the PPP model (Presentation, Practice and Production). During the presentation stage, the grammatical rules for generating sentences in the present simple tense are introduced via both written text and video. Then, during the practice stage, and after learners have examined the available examples and watched their chosen videos, the learners have to practice their acquired grammatical knowledge through six exercises. After that the students are required to do homework and submit it either using a form provided on the same page or email it as an attachment to their teacher. It is worth mentioning that the CALL lesson also offers features such as searching the online Longman dictionary and text chat with the teacher from within the same page. In addition, the lesson also offers a wide variety of links to other web pages that deal with the same grammatical point.

Regarding whether task conditions provide adequate opportunity for constructive concentration on form, it is clear that the lesson is concerned only with grammar and hence, it overtly focuses on form. The difficulty level of the targeted linguistic forms is appropriate for the learners and increases their language ability. The task is appropriate for intended learners. It is obvious that the target learners of the lesson are elementary Libyan students of English whose characteristics are almost the same because they are from a homogeneous cultural background and are usually taught with the same approach (PPP). The lesson does in fact encourage learners to improve their linguistic faculty by concentrating on only one aspect of grammar. Regarding focus on meaning, learners are made aware of the importance of communicating meaning especially through the videos, examples and homework made available to them as well as the links to other web pages provided on the lesson pages.

Concerning the relationship between the in-class CALL task and learners’ interests and needs outside the classroom, the lesson offers the learners a wide variety of examples and explanations from which they can see the connection, particularly the homework assigned to them. After watching some videos, for instance, they will be able to figure the link between the lesson and the real life tasks.

Through the use of the task, learners will learn and use guessing strategies, online dictionary checking strategy, and using the word processor for writing, editing and revising their homework. Moreover, any instructor or learner during this lesson will benefit from utilising the available technology through the use of this CALL lesson. Furthermore, instructors will observe the sound pedagogical practices by utilising the PPP approach.

Finally, the lesson does not require any special knowledge or hardware, and the resources available are sufficient to allow the CALL lesson to succeed, because completing the lesson only requires minimum knowledge of using the internet and email.

7. Conclusion

This essay has provided a picture of modern-day approaches to CALL evaluation ranging from simple to sophisticated techniques. In addition, definitions of CALL evaluation, and descriptions of the object of CALL evaluation and types of evaluators were also included. At the end, a judgemental assessment of a CALL lesson designed by the researcher has been also reported.


Burston, J. (2003). ‘Software Selection: A primer on sources and evaluation’. CALICO Journal, 21 (1), 29-40.

Chapelle, C. (2001). Computer Applications in Second Language Acquisition: Foundations for teaching, testing and research. Cambridge University Press.

Decoo, W. (1984). ‘An application of didactic criteria to courseware evaluation’. CALICO Journal, 2 (2), 42-46.

Hémard, D., and Cushion, S. (2001). ‘Evaluation of a web-based language learning environment: the importance of a user-centred design approach for CALL’. ReCALL, 13 (1), 15-31.

Hubbard, P. (1987). ‘Language teaching approaches, the evaluation of CALL software, and design indications’. In W. Flint Smith (Ed.), Modern media in foreign language education: Theory and implementation (pp. 227-254). Lincolnwood, IL: National Textbook.

Hubbard, P. (1988). ‘An integrated framework for CALL courseware evaluation’. CALICO Journal, 6 (2), 51-72.

Hubbard, P. (1992). ‘A methodological framework for CALL courseware development’. In M. C. Pennington and V. Stevens (Eds.), Computers in applied linguistics: An international perspective (pp. 39-65). Clevedon, UK: Multilingual

Hubbard, P. (1996). ‘Elements of CALL methodology: Development, evaluation and implementation’. In M. Pennington (Ed.), The Power of CALL (pp. 15-32). Houston: Athelstan.

Johnson, D. M. (1992). Approaches to research in second language learning. New York: Longman.

Krathwohl, D. R. (1993). Methods of educational and social science research: An integrated approach. New York: Longman.

Levy, M. (1997). Computer-assisted language learning: context and conceptualisation. Oxford, UK: Clarendon.

Levy, M and Glenn Stockwell (2006). CALL Dimensions: Options and Issues in Computer-Assisted Language Learning. Mahwah, NJ: Routledge.

Philips, M. (1985). ‘Logical possibilities and classroom senarios for the development of CALL’. In C. Brumffit, M. Philips, and P. Skehan (Eds.), Computers and English Language Teaching: ELT Documents 122 (pp. 120-159). Oxford, UK: Pergamon.

Richards, J. C., and Rodgers, T. S. (1986). Approaches and methods in language teaching. Cambridge, UK: Cambridge University Press.

Susser, B. (2001). ‘The defence of checlists for courseware evaluation’. ReCALL, 13 (2), 261-276.