On Semantic Overloading of Terms
I had a discussion recently with a coworker about the etymology of a word and how it was redundant using two words with similar etymology (instruct and structure) in a prompt.
This led to some clarity on how (we believe) LLMs actually interpret semantic meanings of any given word. TL;DR, its subjective, and training-data-based.
The long history of a language means nothing compared to the training corpus that is processed. If a corpus contains slang, idioms, sub-dialect-ignorant meanings, and so forth, that is proportionally what is captured in the weights. The historical lineage of a romanic language and the latic conjugations mean nothing to an LLM in a corpus that exclude them.
Which leads to some interesting implications for modern prompts.
In storytime, I've been experimenting with sleep metaphors, and have found them of higher effectiveness than other fragmented concepts of similar veins. Why? The hypothesis is captured (semantic) meaning.
How does this play out? Lets take dream as a term. Ignoring the historical etymology, a dream references many specific meanings in cultural literature, historical connotations, daily routine meanings... but also in scientific studies and how-tos, marketing literature, and so forth.
In storytime, I'm using dream as a way to consolidate patterns that weren't surfaced in the token outputs but implied as nearest neighbors. This works as a form of background processing of context-as-memory, where long term formation is synthesized and enriched by complex historical connections from the timeline of work as recorded.
In shorter terms, dreams have meaning to the model, so we can tap that meaning into complex behaviors.
I liken this to how I subconsciously picked my online/professional handle: 1ps0.
On the surface, its first reading is leetspeak (1 = i, 0 = o). On the second reading, is "ipso", which is short for "ipso facto" which translates as "by the fact itself".
After that it gets a bit weird. A year or so later, I was searching my handle (as you do), and happened upon the 1PS0 GSCE Pyschology exams. Literally a psychology exam as a qualification that assesses understanding of psychological concepts, theories, and research methods. This is my life fascination in a nutshell.
This may be a bit off topic, but it implies how the consilience of linguistic terms tends to have a strange attractor around meaning, as much as any other complexity concept studied. I would love to study this more in the future. Until then...