Wednesday, September 22, 2010

apparently you're wrong about what's grammatical in your language

From Liliane Haegeman's Introduction to Government and Binding Theory.
7e. Once that [that Bill had left] was clear, we gave up.

The sentence is odd for most native speakers: it is not acceptable. However, this sentence is formed according to the same principle that we posited to account for the formation of (7b)-(7d), i.e., that one sentence may become part of another sentence. Hence (7e) would be grammatical, though it is not acceptable.

Faced with intuitions such as that for (7e) the linguist might decide to modify the grammar he has formulated in such a way that sentence (7e) is considered to be ungrammatical. He may also decide, however, that (7e) is grammatical, and that the unacceptability of the sentence is due to independent reasons. For instance, (7e) may be argued to be unacceptable because the sentence is hard to process. In the latter case the unacceptability is not strictly due to linguistic factors but is due to the more general mechanisms used for processing information.

The native speaker who judges a sentence cannot decide whether it is grammatical. He only has intuitions about acceptability. It is for the linguist to determine whether the unacceptability of a sentence is due to grammatical principles or whether it may be due to other factors. It is the linguist's task to determine what makes (7e) unacceptable.
As a linguist, I'm going to tell you that your naïve intuition that this sentence is ungrammatical is just because you're not smart enough to process the grammatical rules that you know subconsciously -- rules that are in fact mostly encoded in your DNA. What?

Seriously: I'd posit that, if your theory about a language doesn't account for what actual native speakers count as a valid sentence, then your theory is wrong! Is Haegeman representing the general Chomskyan position correctly here?

Our goal as scientists is to account for what happens observably. In what way does a proposed grammar of a language count as a falsifiable scientific theory if you can just say "in reality, that sentence is grammatical -- there were just processing difficulties"?


Andy said...

True. I think it is interesting that linguists (or at least generative linguists) seem willing to allow grammars to violate the intuition of native speakers, or to fill in a spot with an unexpressed word when it seems like there needs to be a word there.

In the case of the null entries, there are times when these seem to match my intuition (for instance, in the introductory class I took, sentences needed a complementizer (that, which, etc.) to indicate a subordinate sentence, but often a null complementizer was there in place of that. This matches my intuition about English. Other times though, it seems like this is really being pressed on very sketchy evidence, and worse, is not stated in a way that would allow it to be invalidated. That's arguably just bad science!

Alex Rudnick said...

Hey Andy! Thanks for reading! :)

So, it's often demonstrably the case that people don't understand their own cognitive processes; so my real complaint is that Haegeman seems to be saying that we shouldn't trust people's judgments about what is and isn't a valid sentence in their native language. If we don't trust native speakers, where's the ground truth? That's problematic, because theorists have a huge incentive to push the boundaries of what's "grammatical" in order to fit a given theory.

I think over the course of the semester I'll compile a list of entities that are claimed to be real, but just "covert".

Mike said...

Does making the distinction between what is grammatical and what is acceptable ever lead to more useful theory?

Is there an example from linguistics that is like the example of the parallel postulate, where taking something, what is basically a physical fact in a certain context, to be false leads to whole new branches of math?

Alex Rudnick said...

Hey Mike, (which Mike? There are several! ...) Interesting questions, for sure!

Taking something that seems to be a hard constraint in language and then relaxing it might bring up some really interesting stuff.

Off the top of my head, if you relax the constraint on spoken language that it happens linearly through time (or that writing is a representation of that kind of process), you get calligraphy. Or maybe collages made of words.

You could imagine tweaking the facts about real-life languages ("parameters", to some people) to get interesting constructed languages that wouldn't have evolved naturally?

What else?

Rehj said...

1. Is it useful to make a distinction between what’s acceptable and what’s grammatical? I would say yes, particularly when dealing with text. To introspect for a moment, there are many times when I’m reading books that I come across a sentence that confuses me, and I have to actually puzzle the syntax out, but once I do so, I see the sentence as grammatical, though usually I still question the “acceptableness”. In other words, I know what the sentence means, but I would never produce it. It may be acceptable in another dialect, but it’s not acceptable in mine.

2. Linguistics in general don't make a distinction between cognitive models and computational models. Generative grammar is a computational model, i.e. it was developed in the complete absence of cognitive data. So in the absence of cognitive constraints, it should obviously overgenerate like crazy.

So from that perspective, I would agree with Haegeman, but I wouldn’t put any sort of pejorative spin on na├»ve speakers judging acceptability instead of grammaticality. Instead, I’d say that the fact that the grammar overgenerates is a demonstrable problem with the theory that requires the addition of cognitively-plausible constraints.

Further, from this perspective, I would say the other problem with the theory is that in fact, it undergenerates, i.e. every time you come up with an example that seems to violate their universal principles, they say, “But that’s a different dialect,” as if that makes the problem go away. So they need to be simultaneously broadening the theory to really get everything, and begin developing the cognitive constraints that people are really using.

Mike said...

Has work been done on the exact nature of the confusion when trying to apply meaning to strange constructions? In what ways does the grammar-verifying and meaning-mapping machinery get confused? Is it usually a function of exposure to those constructions or are there certain things that are just by their nature confusing?

(P.S. I'm Mike from Hella Gems)

Alex Rudnick said...

@Mike: More good questions!

I think the answer is yes (lots of work has been done on processing difficulties), but I'm definitely not the right person to ask about it.

Rehj probably knows more, if she wants to explain or provide pointers!

In general, the fields that look into that sort of thing (being confused, etc) would be psycholinguistics or cognitive science...

Alex Rudnick said...

@Rehj: your point #2 is super-interesting.

When you say "in general", do you mean "not ever", "usually not", or "not always"?

If somebody was to present me with a theory, I'd want to know whether they intended it to be an algorithm for an AI (useful), a description of how they think humans operate (interesting), or just a description of the language (less useful, less interesting)...

This, I think, gets at the crux of what's been bothering me about the Syntax class (taught from a generative perspective) that I've been taking. It's almost certainly not plans for an AI, and if it's about people, there's been no discussion of empirical studies that involve people. I can (programmatically) come up with an infinite number of hypotheses that describe the same data; how can we go about picking the "best" one?

Rehj said...

I suppose I mean "almost always", leaving room for the rare exception (that I haven't met). It's just something that isn't part of the discipline of linguistics.

Most of the theories are built purely by analyzing data, so it's essentially an algorithm for producing the right answer. Of course, it's a very complicated algorithm, so it's still under development. But, aside from the fact that the problem really really complex, it's just an algorithm, so it's really no different than if we were, say, multiplying. We could use multiply one digit it at a time. We could double and halve. We could count. We could add. And we'd get the same answer regardless of how we did it. So how do we pick the best one? It depends on what we need: speed? ease of computation? In the case of a problem with a model that could be simplified more or less, greater accuracy?

Time and time again, in both syntax and now phonology, I see a theory that's built purely on data like this, but then because it "seems" to work, it's presented as if it were describing something about the way we think. This is because linguists just don't realize there's a difference.

There have been some studies in CogSci, both MRI (spatially localized) and EEG (temporally localized) trying to start getting at what really is going on inside, but that's an area that's in its infancy.

Studies looking into "confusion", sure, people try to get at that by looking at processing time. There's a lot of factors that go into that, though, so as far as I know the different parts of it haven't yet been teased apart in a very useful way yet.

As far as other methods of applying constraints, the one I hear about most often [from Sandra :)] was the study that looked at how many levels of embedding existed throughout the penn treebank. It's still not looking at cognitive data, but at least it was acknowledging that no, we don't do infinite embedding even though generative grammar allows for it.

Rehj said...

(And I suppose I should be clear that when I talk about "producing the right answer" in this context, I just mean "describing the data adequately;" obviously "adequately" is going to differ based on what you're trying to use your description for, which is one reason there's so many different syntactic theories.)