- This event has passed.
Computational linguistics talk – Mahowald
February 6, 2020 @ 4:00 pm - 5:00 pm
Cognitive and communicative pressures in language
Human language is a fruitful domain for understanding the basic computation and information-processing capabilities of the mind, as well as for exploring human social structures. First, whereas many cognitive tools are similar across cultures, there is wide diversity among human languages. Thus, world languages offer a natural experiment for exploring cognition and culture. Second, because of work in information theory, communication is well understood mathematically. This makes it possible to compare the busy, buzzing domain of human language with cold, idealized communication. Third, there has been major engineering progress, largely through machine learning, in building computational language systems. These successes can serve as starting points for reverse-engineering how linguistic computation is performed in the mind.
With that reasoning in mind, I use ideas from computer science about efficient communication and ideas from psycholinguistics about constraints on human language processing to generate hypotheses about language structure and explore how it informs our understanding of human cognition. In the first part of my talk, I will focus on the lexicon and explore why languages have the words they do instead of some other set of words. For instance, consistent with predictions from Shannon’s information theory, languages are optimized such that words that convey less information are a) shorter and b) easier to produce and understand. That is, word shortenings like chimpanzee -> chimp are more likely to occur when the context is predictive. And, across corpora derived from the Wikipedia text from almost 100 world languages, we see a robust correlation between string probability and token frequency. Next, applying domain-general ideas about cognitive efficiency to syntax, I show that, across 37 world languages, the distances between dependent words are minimized: evidence of functional cognitive pressure at play in large-scale language structure. In the second part of my talk, I will discuss ongoing and future work: exploring, first, how a corpus of academic papers can be used to investigate how social dynamics and demographic differences within and across fields manifest themselves in academic writing and, second, how we can use the tools of AI (focused on artificial neural systems) to understand linguistic computations. I conclude with a brief discussion of my research program in statistical meta-science and best practices for research in psychology.