I was just fantasizing: If it were somehow possible to strap every English speaker on earth with a mic and a box with infallible text to speech software (as well as a camera to catch anything the person reads) that would transmit everything uttered or read back as text to a central database for analysis, we'd have an amazingly cool and detailed picture of the language from a lexical perspective: how many exposures to a certain word are needed before a child can use it productively himself, exactly how much the language "changed" each year and what words died from the memory of the last living person who knew them, how long between exposures to a word leads to a person claiming he has never heard a word before in his life, etc. We might even be able to predict some really crazy stuff like a theoretical maximum average vocabulary for an average individual if the 'society' in which his interactions took place consisted only of individuals having PhD's with a certain very high input rate of low-frequency lexical elements (from books, etc.) based on the frequency of encounters required with an average lexical element before it tends to be integrated into one's productive vocabulary. Totally rad, huh? (Obviously a lot of the numbers would just be 'ranges', but with a large enough sample it would have huge implications for the design of some extremely scientifically perfected language course.)
I'm wondering about all of this in part because I learned a word the other day, one that I'm sure I've never seen before in my entire life. When I used it on a friend to see if he knew it, he told me he was absolutely certain I had used it quite often telling stories four years earlier. Frightening! I don't just mean cool or weird, but I was very literally frightened—the fragility of memories makes it feel like we all have Alzheimer's or something.
So my idea with the illustrations below is that as a native speaker there is a huge list of lexical items which there is a 99.9% chance you have acquired, which forms the basis for being a native. This is the semi-permanent vocabulary and is refreshed so often there is little chance of losing any of it except for the momentary memory "fart". Meanwhile there is another subset of dynamic words in each person's vocabulary which includes words which do not have the basis for being necessarily permanent:
- slang (once you stop hearing anyone else use it, it stops coming out of your mouth too, it becomes too dated),
- words one didn't really know previously but which are the "pet words" of a particular author in a book you just read,
- words which very specifically describe a present circumstance but which will fade from usefulness (news media makes everyone aware of the term, and when they stop it is subsequently forgotten by a lot of people).
If we had the massive data described above it would be possible to make really cool charts of at least two types for words as they pass through society almost like a virus: (1) words disseminated by news media, movies, or music which enter the brains of a huge portion of the population very quickly and pass away and are forgotten almost like a gradient effect in the end (2) words which people pick up in a book and think are cool enough to use once or twice before forgetting. For these there could be an animation of a "bug" jumping from brain to brain sometimes living on for a long time in one, sometimes dying rather quickly or even before jumping.
Two illustrations for you to check out:
My theory is that the only reason Linguistics is not overwhelmingly the coolest most exciting science on earth is that with all the cool and interesting 'experiments' which one could conceive of, the vast vast majority would be impossible to conduct for very practical reason such as: basic human rights (as in we cannot ethically take 1,000 babies and put them in a desolate region of the earth without teaching them any language and observe a human language as it is created from scratch) and the sheer volume of labor involved in collecting the necessary data (all the really cool analysis computers can do is negated because the only data we have to feed is mostly from polished written text, the collection and conversion to text of real living spoken language in mass quantities is not happening anywhere on earth).

