Consciousness: The Scene
Consciousness allowed our ancestors to spread out while maintaining cultural alignment. It operates in the same way in human infants and adults, providing for rapid cultural learning in noisy environments with significant poverties of linguistic stimuli.
Author's Note
I have written a paper that sketches out a theory of consciousness.
What I argue in this paper is that what we call—and feel as individual—human consciousness is the product of co-evolution with a strong sexual selection component. This individual consciousness is an internalization of what I call, following the ideas in (Frith, 2025) [PDF], social consciousness. The internalization of social consciousness along with social consciousness itself made it possible for Homo sapiens to migrate out of Africa—indeed, it compelled this migration possibly—some 70,000 to 50,000 years ago and marks a dividing line for the species when cultural evolution begins to operate on ourselves. Consciousness allowed our ancestors to self-domesticate and spread out while maintaining cultural alignment. It operates in the same way in human infants and adults, providing for rapid cultural and language learning and affiliation in the face of noisy environments and enormous poverties of linguistic stimuli.
Preface
If a tree falls in a forest and no one is around to hear it, does it make a sound?
What I hope you experience, after reading this theoretical sketch of consciousness (in all its parts), and then returning to this ancient question, is that it is a question only consciousness could generate. Of course, on its surface, the question seems easy: just set up an audio recording in the forest, and you'll be guaranteed to collect evidence that answers the question with 'yes.' In fact, no experiement is necessary. We know how 'falling' and 'sound' work (pre-scientifically, pre-historically), so of course it makes a sound. We're just not there to hear it. It is similarly straightforward to define 'sound' purely from the listener perspective and answer the question with 'no.' We know, again, how 'sound' works on us, and the falling tree, whatever it produced, cannot be said to have produced 'sound' precisely because the 'sound'—if it existed—was not observed by anyone.
Consciousness provides for the tree-sound's cultural and communicative existence, its presence and relationship to us. And that existence, within human culture, requires both the tree to 'speak' its falling sound, as it were, and for that experience to be (intentionally) transmitted to and shared with human listeners. Consciousness answers the riddle with a superposition of yes-no: the falling sound exists if it is a cultural and communicative signal, otherwise it doesn't. It is as though we are forced by consciousness to consider the question this way: why would a falling tree bother to make a noise if no listeners could hear it?
Note on Terminology
As I have done above, I will, throughout most of this sketch, be describing consciousness, both social and individual, as a duality of speaker and listener. I of course do not propose that these roles necessarily physically exist in the brain as speakers and listeners, but a homologous process in the brain can behave in this way. For example, to readers unfamiliar with consciousness research, a widely accepted model of consciousness in the brain is called the global workspace theory, which, at least in summary, also alludes to a kind of speaker-role and listener-role (emphasis mine):
The theater metaphor is ancient and is associated with more than one theory of consciousness. In GWT focal consciousness acts as the bright spot on the stage, which is directed by the spotlight of attention. The bright spot is surrounded by a "fringe" of vaguely conscious events (Mangan 1993). The stage corresponds to "working memory," the immediate memory system in which we talk to ourselves, visualize places and people, and make plans. Information from the bright spot is globally distributed to two classes of unconscious processors: those in the shadowy audience [the listener], who primarily receive information from the bright spot, and unconscious contextual systems that shape events [the speaker] within the bright spot, who act "behind the scenes."
I. The Scene
Let us start with a story. The story of the Tower of Babel.
Now the whole world had one language and a common speech. As people moved eastward, they found a plain [Babylonia] and settled there. They said to each other, "Come let's make bricks and bake them thoroughly." They used brick instead of stone, and tar for mortar. Then they said, "Come let us build ourselves a city, with a tower that reaches to the heavens, so that we may make a name for ourselves; otherwise we will be scattered over the face of the whole earth." But the Lord came down to see the city and the tower the people were building. The Lord said, "If as one people speaking the same language they have begun to do this, then nothing they plan to do will be impossible for them. Come, let us go down and confuse their language so they will not understand each other." So the Lord scattered them from there over all the earth, and they stopped building the city. That is why is was called Babel—because there the Lord confused the language of the whole world. From there the Lord scattered them over the face of the whole earth.
This is a ‘poetic’ vision of what humans may have faced in the great migration east out of Africa some 100,000 years or so ago (and in waves before that). Non-linguistic or semi-linguistic tribes brought together physically and socially by the common foe of climate change simply had to cooperate and coordinate to survive. This pressure, along with internal pressures, caused humans to gradually develop cultures that each provide for common human coordination and quick adaptability—learning. We did not wait around for evolution to solve the problem. We solved it. And we did this with four main 'tools': (1) joint attention, (2) language, (3) gender roles, and (4) the transmission model of communication.
Daniel Dor's theory of language development shows how this process could work for both cultural learning and language learning. Here, then, is Section 1 of the paper:
In his marvelous and I presume understudied work The Instruction of Imagination, Daniel Dor proposes a theory of language development which treats language as a built-from-scratch social communication technology—our species' very first internet, in essence. Dor's triadic relationship, which forms the centerpiece of his theory, models the technology of language as a shared, collectively assembled symbolic landscape, connected bidirectionally with each participant's private experience and with the shared linguistic sign of speech itself.
Dor stresses in his work that if indeed language was an invented social communication technology, it must, as all technologies must, have a functional envelope—an area defined by those things the technology can be 'easily' used for and those things it cannot. For Dor, language's purpose—the reason for the technology—is to 'instruct the imagination.' It is practically useless when people are together, sharing the same experience; its functional specificity is revealed by its ability to transmit experiences and share them among people who are not 'together' in time or space. Although I agree with Dor's suggested purpose for language as a technology, its very existence as this technology (with which I also agree) raises the question as to what impelled us to invent it (again and again).
An overlooked answer (in this 'compulsory' sense) is joint attention, or We-mode, an automatic behavior in the presence of others which generates a merging of private viewpoints—a We-representation, which is roughly the same for everyone. Upon the arrival of joint attention—with 'arrival' meaning a gradual, evolutionary one—our internal environments have shifted such that, in the presence of others, we are compelled to perceive things (not absolutely) in a first-person-plural perspective. Language is a natural byproduct of and a solution to this problem of joint attention. We 'suddenly' infer alignment to a shared view with joint attention, and we work together over time to develop shared meanings of our new collective world, rather than mere aligned inferences about it. Since joint attention is coordinative, language use must be as well. This means the process of moving from non-linguistic inferential alignment (caused by joint attention) to full language involved negotiating norms for speakers and for listeners as well as shared meanings.
I take joint attention and Dor's theory together as satisfactorily explaining the process by which we humans could have created our cultures and languages. But this process, without some kind of coordination, is a logistical nightmare. We also needed a division of labor and some communicative norms (as Dor mentions above) to channel our cognitive work. This will bring us to gender roles and the transmission model next.
For more information on Dor's theory and on joint attention, I have written one or two pieces that I think are valuable. And of course, I can point you to the books I have read that are informing my thinking about this theory:
Dor's Theory
Joint Attention