Does a neural network need domain knowledge? Stephen Downes shows how important this question is, so I’ll try my guess.
“if we require domain knowledge in order to learn, then we require memorization”
According to the traditional learning theories, we construct knowledge upon existing knowledge, for example by linking the new stuff into the known. When we see knowledge as representation, all the higher order concepts are nicely grown from lower level concepts. But at the very bottom there must be some special knowledge, sort of “seed” knowledge, to enable this recursive mechanism because otherweise we would run into an infinite regression.
Even if we use the less abstract idea of “recognition” we might misunderstand it and ask: how can we recognize what we do not already know? The trick is that recognition needs only a part of the features of a pattern to see the whole pattern.
The artificial neuronal networks described in the above post, apply such pattern recognition to learn solely from conversations. How far can this approach be extended? To the bottom, to eliminate the need for domain knowledge?
Let’s first look at human neuronal networks. The conversations begin long before the human can talk, when the baby has their first gaze “conversations” with their mother, at the age of just a few weeks. It is here that they recognize the world around them, long before they learn propositions about “he, she, it, they”, and even the “I” is learned only via the “thou” in these first bodily conversations.
(If it seems far-fetched to compare this kind of conversations with the sophisticated concepts of a given knowledge domain, consider how language covers a long scale of words: While it ends with isolated concepts coded in specialized, domain-specific terminology, it starts with basic ideas that suggest a very bodily context, and a full-senses/ all-at-once recognition, such as the deictic notions of “I”, “now”, and “here”, or other simple spatial or temporal descriptions which are gradually extended via synaesthetic metaphors, or with the modal words that extend from “wanting” to deontic use to epistemic use.)
Similarly, how do neuronal networks learn how to find out which response should be trusted? For the little human neuronetworks, the seed trust is given before they need to start reasoning, and it will enable them to add more trust criteria over time.
Artificial neuronal networks, by contrast, should always depend on some human who decides if their response is correct, or at least if their underlying algorithm is acceptable. So, there are limitations of what they can learn on their own. (This may be a consolation, but still it is unsettling how few humans might be controlling them.)
So, I think, while artificial networks need some prerequisite input, human neuronal networks use recognizing from the very beginning and require no indispensable prerequisites.
This sheds different light on the idea of prerequisites, and it also makes the idea of a “course” much different from the traditional temporal sequential arrangement. If I encounter prerequisites later, there is still time to cover them (like a rhizome is not necessarily pulled out from the top to the root but often in the reverse direction), as long as the course does not build the stuff of later weeks upon the stuff of previous weeks but uses the weeks’ skeleton only as a schedule of when people can simultaneously discuss a certain topic (i.e., if it is organized as a cMOOC rather than an xMOOC), or even lets the community “be” the curriculum.
This is the first, obvious meaning of
“how education may be linear, but learning certainly isn’t.”
(which Doug Belshaw spends “five minutes explaining”). Recognition explains the deeper mechanism of learning as not linear/ sequential (not via fixed isolated representations) but as laminar/ all-at-once (multiple connected features of a pattern). Thanks so much Stephen Downes for the idea of recognition.
The image above shows the logical dependencies of the chapters of an important textbook of my study (B. L. v. d. Waerden, Algebra I, 8th ed., 1971) which I encountered again as an illustration of linearity in the first German book on Hypertext by R. Kuhlen, 1991.