The theory under discussion suffers a lot from the undesirable confusion in its terminology, because it lacks a consistent label that is acceptable to everybody. It has been given many undistinguished labels. Some call it Government & Binding (GB), others call it Principles & Parameters Approach (P&P or PPA henceforth), and there are also those who prefer to to identify it with rather different labels such as Minimality or the Minimalist Program (MP). It must be noted that these labels are not synonymous, because each one refers to a quite distinct version of the theory, and there is no clear cut between the stages defined by them. Thus, Alec Marantz (1995) was right in referring to it as “this latest version of Chomsky’s Principles and Parameters Approach”. He implies by this that, at least in his mind, “Minimality” is just a newer version of “PPA”. The list in (1) below shows the different labels used to identify this theory, all these labels (or names) are objectionable for some reasons.
a. ‘The framework that is associated with Noam Chomsky and his students at the Massachusetts Institute of Technology.’
b. Standard Theory (ST henceforth).
c. Revised, Extended Standard Theory (REST).
d. Government & Binding (GB).
e. The Principles & Parameters Approach (P&P or PPA).
f. Minimality/ Minimalist Program (MP).
We might assign the name ‘Chomskyan Theory’ to (a) above, but this would be unacceptable to many because this theory is a result of cooperation between many researchers. In fact, this theory is not tied to a single individual or small group of individuals. While Chomsky was the guide and evaluator of the new developments, the research in this program is freewheeling and its proponents frequently disagree among themselves including Chomsky himself. So what is the most common label used to identify this theory?
One label is ‘Standard Theory’, which offends many people because it entails that the “standard” is given by Chomsky and his followers, and whatever deviates from it is not. Many “Standard Theoreticians” who talk as if the “Standard Theory” were the only theory available reinforce this attitude. Furthermore, the label “Standard Theory” refers to the entire history of syntactic theory that is built by Chomsky and his students over several decades. In fact, it also includes several fundamental sections, which have been developed occasionally and differently. This framework began in the mid-sixties with the application of Chomsky’s 1965 book “Aspects of the Theory of Syntax”. The label “Standard Theory” refers specifically to the theory presented there. It is also called the “Aspects Model”.
Over the fifteen years that followed, the framework was revised to the extent that its character changed fundamentally. By the early eighties, another different framework was developed from the “Aspects Model”. The publication of Chomsky’s 1979 lectures at Pisa under the title “Lectures on Government and Binding” presents this framework in an organised coherent form for the first time.
Unfortunately, the title of the book was given to the framework (Government & Binding or GB). The Pisa lectures and the book were appropriately titled because in them Chomsky concentrated on two particular sub-theories, namely “government and binding”, but the framework as a whole consists of many such sub-theories besides that “government” and “binding” are not the most important ones in it. They are just those that Chomsky had more to say about in1979. Chomsky himself has expressed his regret for labelling the entire theory with it, and his preference for the label “Revised, Extended Standard Theory” (REST). As Steven Schäufele (1999) pointed out in his synthinar lecturattes, this label can be used in referring to the Chomskyan Theory.
During the second half of the eighties, the label “Principles & Parameters Approach” (P&P or PPA) was developed among the proponents of the framework. Recently, a new label was circulated among the proponents of the framework. It was the result of the works published in the early 1990s, namely the “Minimalist Program (MP).
Finally, one has to know all these labels, because some proponents of the framework are sensitive about using one label or another. It is also useful when writing research papers to give all the labels, state explicitly what they denote and then choose one and use it throughout the research paper.
Claims of the Theory
The proponents of any theory must assert the original assumptions that establish its basis. They must say how it deals with the phenomena to be observed in a way that leaves no doubt. In other words, the tenets of their theory must be stated clearly.
The original Chomskyan assumptions, which were first established in the Aspects Model (i.e. Standard Theory), are summarised in (1) below:
(1) Claims of REST:
a. All syntactic relations can be described in terms of “Constituent Structure”.
b. Constituents move from one part of structure to another, consequently, one part of structure is transferable into another. Movement transformations distinguish the sequence of constituent structure involved in the generation of a particular syntactic string. Each sequence of constituent structure is called a “derivation”.
c. there is a set of well-formedness conditions resident in the syntactic component of the grammar to which constituent structures, transformations, and derivations must conform (Steven Schäufele, 1999).
The term “Constituent Structure” denotes a complex concept that refers to the way of organising the words and other constituents in a string that involves a combination of two logical relations, namely “Dominance” and “Precedence”. If a constituent precedes another in linear order, the two are said to be in precedence relation. Theoretically, “Dominance” involves the notion of constituent being contained within another. “Tree diagrams” are used to represent dominance relations. By way of illustration, consider the tree diagram in (2).
The labels S, NP₁, VP, and NP₂ are called nodes. These nodes represent the constituents of the string described by the diagram. A node that is linked to a lower node by a line dominates that node. The S node in (2) for example, dominates all the other nodes in the tree. NP₁ in turn immediately dominates Det and N, and VP dominates the inflected verb killed. The nodes occupied by the words him, the, boy and killed do not dominate anything. They are called terminal nodes. We describe nodes, which dominate a single node, as “non-branching“, but nodes that dominate more than a node like S and NP₁ as “branching” (Schäufele, 1999; Ouhala, 1999).
For clarification purposes, kinship terminology is used in talking about nodes. If a node A immediately dominates another node B, then A is the mother of B, and B is the daughter of A. if a set of nodes share the same mother, then they are sisters. It is assumed that any node may have at most one mother. The Dominance and Precedence relations are transitive relations. If A dominates or precedes B and B dominates or precedes C, then A dominates or precedes C. Immediate dominance is not transitive, of course. Another observation about Dominance and Precedence relations is that they are mutually exclusive. Nodes can be in either a dominance or precedence relation, but not in both. However, it is desirable to maximise Dominance and Precedence relations in their domains. Any two nodes must be in either a precedence or dominance relation. This is a result of the claim that, for example in (2), all the daughters of NP₁ (i.e. Det and N) precede the verb killed, therefore the nodes NP₁ and the verb killed are in a precedence relation. This definition implies that the branching lines, which link between mother nodes and daughter nodes, may not cross. This is a typical assumption of REST. Thus Constituent Structure is describable in terms of Dominance and Precedence relations. If we can draw a tree diagram of it, then that tree diagram represents its constituent structure. (ibid.).
There is also another relation called adjacency. Two nodes are adjacent if there is no third node intervening between them. As we shall see in chapter (3), the Adjacency relation is important for specific details of the theory.
In fact, the claim in (1a) is a fundamental assumption of Standard Theory that is not shared .by some other frameworks. What this means in Standard Theory is that terms like “subject” and “object” are merely a shorthand for the longer but more precise explanations. To give an example, a “direct object” in Standard Theory is an NP immediately dominated by a VP node, while a “subject” is an NP immediately dominated by an S node.
What is critical now is the claim that the proper analysis of syntactic string may involve several constituent structures, which share a common skeleton and the same lexical items. The set of all the constituent structures involved is called the derivation of that string. Note that a given item may occupy different positions in different constituent structures of the derivation.
In the Aspects Model, the derivation was a sequence of constituent structures with a linear order that could be imposed on these “levels”. The Grammar in the Aspects Model was divided into three components: the syntactic component, the semantic component, and the phonological component. The syntactic component includes two more sub-components: a base component and a transformational component. The function of the semantic component was to relate the deep structures generated by the sub-component to the meaning of sentences. The phonological component on the other hand relates the surface structures generated by the transformational sub-component to the phonetic forms of sentences. Therefore, each level is immediately preceded by at most one level. The Grammar according to the Aspects Model then is as shown in (3).
The situation is even more complicated in th more recent “Minimalist” work because well-formedness conditions are somewhat different. They relate to the derivation as a whole and less to individual levels and transformations within it. For instance, the economy conditions ‘select among convergent derivations’ (Chomsky, 1995: 378). To explain the basic notion of economy consider the following sentences adopted from Neil Smith’s (1999) book, Chomsky: Ideas and Ideals (page 89).
(4) a. I think John saw a buffalo.
b. What do you think John saw?
c. Who do you think saw a buffalo/
d. Who do you think saw what?
(5) *What do you think who saw?
In questions that contain only one Wh-word, that word can move to the front of the sentence as shown in (4b,c), but when the sentence contains two Wh-words as shown in (4d) only one can move to the front of the sentence while the other wh-word remains in its place. The problem is that why cannot what in (5) move to the front of the sentence. Why is (5) ungrammatical? The answer is that (4d) is more economical than (5). (4d) is more economical because the wh-word who is nearer to Spec CP than what. Although the two constituents can move, the “Shortest Movement” condition permits only the nearest constituent to the specifier position of CP to move. This generalises to other constructions showing that some principle of economy holds. Such constructions include inversion questions in which the first auxiliary has to move to the front of the sentence as shown in the following example.
(6) a. Mariam might have won.
b. Might Mariam have won?
c. *Have Mariam might won?
Here also the same condition of economy applies to permit only the nearest first auxiliary to move to the front of the clause.
Transformations vs. Constraints
What are the differences between REST and ST? during the post-Aspects development of the Standard Theory the basic assumptions of the framework were fundamentally renewed, particularly in the transformational component. The attention was shifted from movement transformations to constraints on them. The attention in the Aspects Model was given to identifying and defining transformations. The syntactician’s job was to define what a specific transformation does and under what conditions it is appropriate. Transformations were regarded as language specific, which means that a child learning a language has to learn a number of transformations.
During the seventies, while the formal language of the Aspects Model continued to be used, the cutting edge of syntactic research, at least in the Chomskyan School, was not to define specific transformations, but to identify constraints on transformations and on the implicit power of the transformational component. After all, it was realised that the formalism for identifying transformations in the Aspects Model was not on the right track. There was a huge range of imaginable transformations that could be formally described but which did not seem to be affirmed in any known human language. If the goal of grammatical theory is to explain how human language works, then the Aspects formalism was missing something.
By the end of the seventies, it became clear that the theory could operate with a single transformation known as “Move α”. This transformation was understood to mean, “move any constituent anywhere”, provided that no constraints are violated in the operation. At the early stages of REST, it was proved impractical to reduce every motivated transformation to an instance of “Move α”. As a result an alternative broader transformation known as “Affect α” was proposed. In addition to movement of constituents, “Affect α” has the ability to rearrange the constituent structure without moving any constituent. For example, Affect α can insert the dummy do to carry the inflectional features of the verb in questions and negatives. Many proponents of REST, however, admit only transformations that are describable as instantiations of “Move α”. These people regard the broader transformation “Affect α” as evidence that the best analysis has not yet been discovered. In fact, the notion of “Affect α” is very attractive, but any theory that can dispense with it is indeed a stronger one.
Up to the seventies at least, the main focus in the grammatical theory was on “rules”, but the attention in REST focussed on general principles of the grammar. What happens is “Move α”, and grammatical theory is concerned with the general principles that delimit its scope of operation. If the transformation can be reduced to a single transformation, “Move α”, why cannot we dispense with it altogether? Indeed there are certain frameworks of syntactic theory that dispense with movement transformations altogether such as Generalised Phrase-Structure Grammar (hence forth GPSG) and Lexical-Functional Grammar (hereafter LFG). But in Standard Theory the essential part of the point is to represent some generalisations that any syntactic theory of human languages must represent somehow. Non-transformational frameworks utilise completely different ways to reach such generalisations. For instance, in an agentless passive clause like “The door was opened“, the constituent “door” behaves in some respects like the “subject” and in others like the “object”. In Standard Theory, this fact is explained by claiming that it is both the “subject” and “direct object”, but at different levels connected by a movement transformation. On the other hand, non-transformational frameworks like GPSG and LFG represent grammatical relations such as the subjecthood of “door” by constituent-structure tree diagrams, but semantic relations are indicated in the verb’s “argument structure“. The link between the two is shown in the verb’s lexical entry.
“Movement” in REST is regarded as the major factor in generation. In view of this framework, movement derives all the processes of the derivation. In principle, movement occurs if it does not result in violations of any constraints. First, it was supposed to operate freely, but during the late eighties, Chomsky and others advocated a principle according to which movement occurs only when it is necessary. This principle is sometimes referred to as the “Least Effort” or “Laziness” Principle. This principle is likened to a political principle called the “Orwellian Principle” (see Steven Schaufele, 1999) which says: “If not forbidden, then obligatory; if not obligatory, then forbidden“. The “Laziness Principle” now serves as a fundamental principle of the Minimalist Program. In syntactic theory, it typically means, “move if you have to in order to avoid ungrammaticality; otherwise stay put” (ibid.).
It is worth mentioning that in the eighties movement of phrases was motivated by the constraints of the grammar, but now the so-called strong features on heads force movement of phrases to local domains for checking.
Levels of Representation
In the Aspects Model, a derivation could consist of any number of levels. Each level differs from the preceding one by a movement transformation. It was believed that certain transformations might have to precede others in order to achieve the desired result. Thus, in the Aspects Model a sentence like (1) would be derived from the “deep structure” in (2) by a dozen of transformations. For instance, for “Equi-Deletion” to erase the NP “Sam” in the lowest clause, the lowest clause would have had to be passivized so as to get “Sam” into the subject position from which it can be deleted.
Note that instead of the sign e, “∆” was used to identify the empty positions in earlier versions of the theory.
With the invention of “Move α”, the motivation behind these assumptions fell by the wayside. It became clear that there was no need to impose an order on the application of transformations. As a consequence of the developments during the seventies, the “deep structure” in (3) as underlying the sentence in (1) replaced the one in (2) above.
In this case, independent constraints force the Passive transformation to operate in both the lowest and the highest clauses. It had no importance to say that Equi-Deletion occurred before or after the passivization of the lowest clause. If the lowest clause is not passivized, the derivation clashes, and if it is passivized, then Equi-Deletion would have also occurred automatically. There is no need to impose an order on transformations. Constituents move to satisfy general constraints and as far as such constraints are not violated. It is not important to talk about the order by which transformations operate. In REST, we do not have to stipulate that certain transformations precede others, rather, this happens automatically. The point is to simplify the grammar of a given language by reducing the number of specifications it has to make. To borrow an analogy from computer science, the hardware can be complicated, but what we need is very simple software (ibid.).
These advancements defined another derivational organisation for the framework. While in the Aspects Model a derivation was a linear sequence comprising a number of levels, in REST it precisely consisted of four levels.
The lexicon lists the lexical items and their properties that make up the atomic units of the syntax. These properties include, for example, what sort of object the lexical verb requires, etc. “DS” means “deep structure”; “PF” and “LF” stand for “phonological form” and “logical form” respectively. “SS” is understood to stand for “Syntactic Structure”. “SS” is central and connected directly to all the other levels. PF and LF are called the interface levels. PF is the interface with the phonology where phonological rules apply to give the string its phonological manifestation. It is similar to the surface structure of the derivation in terms of the Aspects Model. LF is the interface with the semantics where meaning relationships of various kinds are explicitly represented. DS is the interface level with the lexicon where lexical items are combined. In REST the DS represents the base-generated form of a string. Some transformational operations process the DS-representation to satisfy certain constraints of the grammar, and the result is the SS-representation. SS is not directly interpreted itself, but is converted into PF and LF. “Move α” operates between any two levels and nothing important can be said about the order of its operations. The difference in its character is due to the fact that different constraints operate at different levels. For example, the “Theta Criterion” is relevant at DS, the “Case Filter” at SS and PF, and the “Empty Category Principle” primarily at LS.
The proponents of the Aspects Model claimed that the semantics of language had to be encoded in the Deep Structure that is the first level in the derivation of any sentence. They hypothesised that a speaker/writer generates a deep structure that represents the intended meaning, then performs certain operations on that deep structure to produce the final surface sentence that he pronounces or writes. The listener/reader on the other hand, receives the surface sentence and applies the reverse operations to decode the abstract deep structure and interpret it. During the period between the sixties and the seventies linguists recognised that it was not plausible to relate all the semantics of language to only one derivational level. The evidence came from the ambiguity of sentences whose surface structure could have two possible but distinct meanings. Some ambiguities were processed properly within the Aspects Model. For example, the sentence in (4) has two possible meanings. Each meaning can be derived from a different deep structure as shown in (5).
(4) The dog saw a cat running in the farm.
(5) a. [ The dog [ saw [ a cat running in the farm ] ] ]
b. [ [ The dog [ running in the farm ] ] saw a cat ]
Therefore, the Aspects Model explained the ambiguity in (4) by assigning it two different deep structures. It did so because the ambiguity resulted from the confusion in the status of the grammatical relations between the constituents “dog”, “cat”, and “running in the farm” (particularly which one is the subject of the non-finite VP). Since the Standard Theory encodes grammatical relations in the Deep Structure, then the ambiguity in meaning must be caused by a DS conflict (ibid.). a more complex ambiguity results from the ranking of quantifiers. By way of illustration consider the sentence in (6).
(6) Faraj loves everybody.
In orthodox REST the meaning of sentences that contain quantifiers as “everybody” are processed at the LF level. So the example in (6) above will have to be transformed into (7) at LF. This clarifies the distinction between LF and DS because the deep structure of (6) is supposed to be as in (8).
Now what about a sentence like (9)? It has either of the meanings in (10).
(9) Everybody loves somebody.
(10) a. There exists some person y such that for every person x, x loves y.
b. For every person x, there exists some person y such that x loves y.
The interpretation (10b) demonstrates that there is a different person for everyone. Faraj loves Basma, Mohamed loves his wife, Qais loves Lyla, Ahmed loves Majda, Tariq loves Fatima, and Nabeel loves himself.
The statement (10b) is true if we can find pairs like this for every single human being. On the contrary, the meaning in (10a) says that there is some special human being (call it Ala) who is loved by everybody: Faraj loves Ala, Mohamed loves Ala, Basma loves Ala, etc.
In LF theory the two meanings in (10a-b) are represented by the structures shown in (11a-b) respectively. For the meaning in (10b) “everybody” must occur higher in the tree than “somebody”. We say that “everybody” has scope over “somebody”, and the reverse is true for the interpretation (10a).
Note that the interpretations shown in (10) and diagrammed in (11) do not relate to grammatical relations or subcategorisation and therefore cannot be represented at DS.it became clear from the representation in (7) and (11) that LF is generated by the application of Move α to SS. DS is considered to be the original structure of the sentence without applying any operation. Thus, there must be a distinction between LF and DS.
The importance of saying all this is that in REST two levels of representation are involved in the interpretation of sentences. At DS, thematic relations must be represented. For instance, the verification of what constitutes a verb or any other constituent subcategorizes for ought to be done at DS. LF however, represents scope relations. DS on the other hand is concerned only with lexical semantics.
The History of Derivations
Trace Theory was a very important progress in the early seventies. It is the hypothesis that, if anything moves while producing a derivation, it leaves a “trace” in its original place. The trace left behind consists of an abstract copy of the moved constituent but without any phonetic realisation. Traces are identified through co-indexation with the constituent whose movement had created it. Co-indexation is usually represented by an index-letter. It is marked with the same index-letter as the moved constituent whose movement had created it. Trace theory is mentioned here to lead to the discussion of a complex constraint on Move α. This constraint is called “Structure-Preservation Constraint” or SPC. Joseph Edmonds (1976) had argued for a typology of transformations based on their field of operation and on a constraint based on this typology. As shown in the list below, his typology classifies the various kinds of transformations into three types.
(1) Root Transformations.
These transformations apply only to “root” (independent) clauses.
(2) Structure-Preserving Transformations.
This type of transformation leaves the hierarchical organisation of constituent structure exactly as they found it (i.e. do not create, destroy, or rearrange constituent structure).
(3) Local Transformations.
This kind of transformation applies precisely to two adjacent constituents. They are only subject to the conditions within those two constituents. The constituents to which this kind applies need not to be sisters, but merely adjacent in linear order. There must be a c-command relationship between the affected constituents, and at least one of them must not be a maximal projection.
The Structure-Preservation Constraint (SPC) requires that any instantiation of Move α must fall into one of these three transformations. This means that unless movement transformation belongs to either Root or Local Transformations, it may not change the basic constituent structure in any way. To give an example about this, let’s review the history of Passive Transformations. In early Transformational Grammar, a passive sentence like (1) was considered to derive from a deep structure similar to its active counterpart in (2). An optional transformation derives the passive (1) by interchanging the subject and the object NPs. It makes the subject a complement of a PP, and provides the passive morphology and the auxiliary for the verb.
(1) The ball was kicked by Ali.
(2) Ali kicked the ball.
The difference in the Aspects Model was that the PP already exists at Deep Structure, but without complement. The Deep Structure of (1) then would be as in (3).
In Aspects explanations, the PP in the passive sentence (1) is already there, not created by the Passive Transformation. In this way, the Aspects’ Passive Transformation is said to be “structure-preserving”. Since any embedded clause can be passivized, the Passive Transformation cannot be considered a Root Transformation. It also involves two NPs that are not adjacent and so it cannot be a Local Transformation as well. Thus, according to the SPC it must be a Structure-Preserving one. Assuming that the subject NP “Ali” has moved into the PP in (3), the result will be a structure like that in (4). The moved NP “Ali” leaves a trace in the subject position coindexed by the letter “i” with the former subject NP “Ali”. But since traces cannot be erased, it is impossible for the direct object NP “the ball” to move into this position. To be licensed as a new subject, changes must occur in the constituent structure, which unfortunately violates the SPC.
The importance to saying all this is that in REST the DS representation of a passive sentence like (1) is necessarily as the one in (5). The agent “Ali” is base-generated as the complement of a “by”-phrase and the subject position is empty. The direct object “the ball” can move into it without creating any new structure.
In REST, as it will become clear in the next chapter, this analysis is credited by concerns of Theta and Case Theory. In fact, there is an exception to the definition of a “Structure-Preserving” Transformation, namely what is called “Adjunction”. Adjunction is a recursive process that integrates adjuncts (optional arguments) into the syntactic structure. It targets the X1 projection as illustrated by the box in (6a).
Then it makes a copy of the target node right above the original one, as in (6b). Finally it attaches the adjunct phrase as a daughter of the newly created node as in (6c). Adjunction seems to create a new structure as it creates a new node and therefore it cannot be called “structure-preserving”. But since adjunction is a recursive process and because the new V’ node immediately dominates the original V’ node, it is considered a “structure-preserving” transformation. This understanding is explicitly clarified in some works published in the mid-eighties (see for example Chomsky, 1986b). but what about “Local Transformations”! Steven Schäufele (1999) argued that they only occur between SS and PF (e.g. subject AUX inversion) as a consequence of the theoretical desirability to restrain the power of Move α.
At the end of this section, it is worth mentioning that trace theory and the SPC together assure that the history of a derivation can always be recovered. As it will become clear in the next chapter, the framework has some intricate complications that might obscure the derivational history, but to some extent, they are comparatively few. The result is that even at LF we can reconstruct the base DS representation of a string. In fact this is crucial to the operation of the framework, moreover it is attractive to many theorists.
 Government is the relation between a syntactic head and its dependents, but binding refers to the relationship between a pronoun or anaphor and its antecedent.
 There was a lot of talk in the sixties (and later) about specific transformations such as the Passive Transformation, There-Insertion, and Dative Shift.
 α is understood as a variable that can stand for any syntactic constituent
 Αffect α is interpreted as “Do anything to anything”..
 There is a good discussion of this principle in Steven Schäufele’s Synthinar Lecturettes.
 These constraints will be explained in the next chapter.
 A ‘trace’ is typically represented by a lower case ‘t’ for ‘trace’ or ‘e’ for ’empty’.
 The letter ‘i’ is used for index; if more than one is needed, the letters after ‘I’ in the alphabet are used.
 The question of the passive auxiliary is left out because it is irrelevant to the discussion.