# Cortical Learning

### 1. Introduction

The brain is a fascinating computational device that no algorithm to date can imitate. Neuroscientists tend to focus on understanding the brain at the micro level and have uncovered a lot of the anatomy and physiology at this level, yet we do not understand how the high level functionality of the mind emerges from the low level systems of neurons and their interplay. What we need is a formal computational model of learning in the brain that is plausible given the constraints of findings in neuroscience.

I will begin by discussing such a model, the neuroidal model, that was introduced by Leslie Valiant. This model consists of a directed graph of neuroids, simplifications of neurons, that can only communicate with other neuroids in their direct vicinity. In this model, a concept is represented by a bunch of neuroids which Valiant refers to as an item and that concept being “thought” is the result of many or all of the neuroids firing simultaneously.Valiant proposes that the brain might implement operations on these items to JOIN or LINK them – similar to how we might figure if something has bark and leaves then it’s probably a tree or how we associate ice with being cold.

After discussing Valiant’s model and the implementation of the operations JOIN and LINK in this model, I will discuss Christos Papadimitriou’s contribution of a new operation, predictive join or PJOIN, which was motivated by the vital role prediction plays in cognition. An advantage of PJOIN over JOIN is that the items being joined, say “bark” and “leaves”, no longer have to be shown simultaneously for “tree” to be recognized. This is accomplished by a prediction of “leaves” when only “bark” was fired making it so that if “bark” is experienced a short time later, “tree” will fire.

After looking at Papadimitriou’s PJOIN, I will wrap up by, at a high level, proposing a novel computational model that is biologically plausible and that is also capable of prediction and hierarchical storage of concepts. In this model, cyclic activity is very important for the storage of concepts.

Let me make a note here that neural networks commonly used in industry today are very loosely inspired by computation in the brain and often clever biologically implausible techniques are used to get them to solve interesting problems. For example, backpropagation seems to be implausible in the brain because it would require neurons to communicate error values backwards through the network which is a highly nonlocal operation.

### 2. Valiant’s Neuroidal Model

Let’s begin by looking at Valiant’s neuroidal model. The model is that of Vicinal Algorithms. The crucial aspect of Vicinal Algorithms is that a neuroid can only communicate with its neighbors. Vicinal Algorithms run on a random directed graph with neuroids, which are the vertices, ${v_i}$, and synapses which are the edges, ${e_{ji}}$.

The neuroids and synapses have states at time t as follows:

• Neuroid ${v_i}$ has state ${(T_i^t, f_i^t, q_i^t)}$
• ${T}$ is the threshold – some real value that is for the most part a constant.
• ${f}$ is 1 if the neuron fires or 0 if it does not.
• ${q}$ is some finite memory which can keep track of the state of the neuron.
• Synapse ${e_{ji}}$ (from ${v_j}$ to ${v_i}$ has state ${(w_{ji}^t, qq_{ji}^t)}$
• ${w}$ is the strength or weight of the synapse.
• ${qq}$ is another finite memory which can keep track of the state of the synapse.

The dynamics of the model at each time instant, ${t}$ evolve in the following way:

• First, the firing status must be updated.
• First, for each neuron, a variable, ${W_i}$, is introduced which adds up the strengths of all firing neurons that feed into it. $\displaystyle W_i^t \leftarrow \sum_{j: f_j^t = 1} w_{j,i}^t$
• Next, If this total strength exceeds the threshold, the neuron is set to fire for the next iteration. $\displaystyle \text{if } W_i^t \geq T_i^t \text{ then } f_i^{t+1} \leftarrow 1$
• After that, the remaining properties must be updated. Note that Valiant purposefully defines this evolution very loosely so as to accommodate a variety of possible evolution functions.
• The state of the neuroid along with its threshold are updated in the following manner: $\displaystyle (T_i^{t+1},q_i^{t+1}) \leftarrow \delta(T_i^t,q_i^t,f_i^t+1,W_i^t)$
After a neuron fires, it enters a refractory period where it cannot be fired again for some period of time. The threshold will for the most part only change to reflect this refractory period. For example, if we want the refractory period to take ${R}$ steps and we want to have the threshold return to normal somewhat smoothly, we could have ${R}$ memory states after a neuron fires where the threshold is initially increased by a factor of ${2^{R-1}}$ and with each subsequent iteration halved.
• The state of the synapse along with its strength are updated in the following manner: $\displaystyle (w_{ji}^{t+1},qq_{ji}^{t+1}) \leftarrow \lambda(q_i^t,qq_{ji}^t,f_j^t, f_i^{t+1},W_i^t)$/

Notice that first all neurons determine if they fire in the next step and only after that all other properties are updated. This is important to make sure action potentials are transmitted instantaneously. If you did both steps at once for each neuron, the resulting state of the system would differ based on the order in which you pick the neurons to evolve.

In Valiant’s model, a concept is represented by some set of neuroids. A concept is referred to as an item. Many or all of the neuroids of an item firing simultaneously corresponds to “experiencing” that concept.

Further, Valiant implements operations on items. Valiant proposes two such operations: JOIN and LINK. The JOIN operation takes two items A and B and creates a third item, C = JOIN(A,B). The result is that when both A and B fire, C will fire. However, if either A fires or B fires but not both, C will not fire. Imagine A is strawberry and B is ice cream. When both fire, C, strawberry ice cream would fire. But, when only one but not the other fires, strawberry ice cream will not be experienced. The LINK operation LINK(A,B) makes it so that when A fires, B will always fire. For example, A is banana and B is fruit. Whenever banana fires, you will also experience a fruit. Let’s now take a deeper look into how these operations can be implemented.

Given two items A and B, here is how C=JOIN(A,B) can be implemented:

• the neuroids of C will be selected from some ${r}$ neuroids that have at least ${k}$ synapses from A and ${k}$ synapses from B for some ${r}$ and ${k}$. We shall say that the neuroids with this property are called ${CANDIDATE}$. This means their state is ${(T, 0, null)}$ – they are not firing and their memories are null. All incoming synapses to these neuroids have states ${(w = \frac{T}{k}, 1)}$ – their strength is ${\frac{T}{k}}$ so it takes k neuroids to fire in order for the candidates to fire, and they are in some memory state.
• Now, say A fires. Any candidate neuroids that fired as a result of this (note that those neuroids must have had ${\geq k}$ synapses from A.) obtain a state called ${POISED}$: The synapses from A to each of these neuroids adjust their strength, ${w}$ to ${\frac{T^2}{2kW_i}}$. This value was chosen for the following reason: We know the strength of each synapse to be ${\frac{T}{k}}$. Say ${x}$ synapses fired. Then, ${W_i}$, the total strength of all the synapses from A is ${x\frac{T}{k}}$. Thus, the number of synapses is ${\frac{kW_i}{T}}$. In this new poised state, we want the synapses to achieve a sum of ${\frac{T}{2}}$ (So that B will eventually fill in the other half). Thus, if ${y}$ is the new strength, we want ${y\frac{kW_i}{T} = \frac{T}{2}}$. Therefore, ${y = \frac{T^2}{2kW_i}}$. Now, we must do one last thing: All neuroids that were candidates but did not fire will enter a state called ${DISMISSED}$: The synapses from A to these neuroids will adjust their strength to ${0}$ – they are unnecessary in the creation of C.
• Now, in the next iteration, say B fires. Any poised neuroid that fires enters a state called ${OPERATIONAL}$: the synapses from B to the neuroid adjust their strength to ${\frac{T^2}{2kW_i}}$ so that now if both A and B fire simultaneously, they will have enough strength to cause the operational neuroids (item C) to fire. All neuroids that were poised but did not fire due to B are ${DISMISSED}$.

Let’s sum up the algorithm to create C = JOIN(A,B):

• Step 1: A ${\rightarrow}$ fire; neuroids from A and B that are not A or B ${\rightarrow}$ ${CANDIDATE}$;
• Step 2: ${CANDIDATE}$ and fired ${\rightarrow}$ ${POISED}$; ${CANDIDATE}$ and not fired ${\rightarrow}$ ${DISMISSED}$; B ${\rightarrow}$ fire;
• After: ${POISED}$ and fired ${\rightarrow}$ ${OPERATIONAL}$; ${POISED}$ and not fired ${\rightarrow}$ ${DISMISSED}$;Next, let’s take a brief look at the LINK operation. LINK(A,B) works similar to JOIN but it makes it so that whenever A fires, B will fire next.
The summed up algorithm for the creation of LINK(A,B) is as follows:

• Step 1: A ${\rightarrow}$ fire; B ${\rightarrow}$ ${PREPARED}$;
• Step 2: Some relay neuroids A was connected to will now fire; These relay neuroids will cause the neuroids of B to fire;
• After: the neuroids that were ${PREPARED}$ and fired ${\rightarrow}$ ${OPERATIONAL}$;

Papadimitriou noticed that in Valiant’s model, if two items, A and B are not shown precisely at the same time, then C will not be activated. He also used the fact that a key process to cognition is prediction. Thus, he proposes a new operator that does predictive join, C = PJOIN(A,B). PJOIN is an extension of JOIN and in the same way, when A and B both fire simultaneously, C will fire. But additionally, if just A fires, C enters a state where it “predicts” B. Here, C will be ready to fire if only B fires.

Another interesting aspect of PJOIN is that if B happened to be a PJOIN of some other items, say D and E, and A had just fired, then as before, C will predict B. This will “mobilize” B to predict it’s “children,” D and E.
I will start by describing the algorithm that creates PJOIN:

• First, create a regular item C = JOIN(A,B).
• Each neuroid of C, with probability one half comprises a new “sub”-item ${C_p}$ where the p implies this is “predictive C.” The neuroids in C that are not in ${C_p}$ enter a state called ${OPERATIONAL}$ that function just as they would in JOIN.
• Suppose A and B are also PJOINS implying they also have parts ${A_p}$ and ${B_p}$, respectively. Now, we perform LINK(${C_p,A_p}$) and LINK(${C_p,B_p}$) linking the predictive parts of C to the predictive parts of A and B with “downstream” connections. After linking, the synapses from the relay neuroids to ${A_p}$ and ${B_p}$ will be in a total state called ${L-OPERATIONAL}$ of memory, ${qq=PARENT}$ implying that these synapses are going “downstream.” These synapses will have a larger strength ${w}$ than the minimum required one, say double, so that neuroids can identify firings from parent neuroids.
• After this, ${C_p}$ enters the state ${P-OPERATIONAL}$ where all strengths of synapses from A and B to ${C_p}$ (but not to the rest of C) are doubled. Since after the creation of JOIN, the strengths from synapses of A to a neuroid of C totaled ${\geq \frac{T}{2}}$, now, the syanpses from A to a neuroid of ${C_p}$ will have strength totaling ${\geq T}$. As a result, ${C_p}$ will fire if just A fires or if just B fires and also when both fire. If only A or only B fire, this will initiate a cascade of “downstream” firings of other PJOINS that searches for missing items.

Let’s sum up the algorithm to create C = PJOIN(A,B):

• Step 1 and 2: Create C = JOIN(A,B);
• After: ${C_p}$ is a subset of C containing around half of C’s neuroids. C and not ${C_p}$ ${\rightarrow}$ ${OPERATIONAL}$;
• Steps 3 and 4: (These are performed in parallel) create LINK(${C_p,A_p}$) and LINK(${C_p,B_p}$).
• After: ${A_p,B_p \rightarrow}$ ${L-OPERATIONAL}$ in memory state ${PARENT}$; ${C_p \rightarrow P-OPERATIONAL}$ with synapses from A and B having double strength;

Now, let’s discuss the operation of PJOIN(A,B):

• If A and B fire simultaneous, PJOIN operates in the same way as JOIN
• Without loss of generality, suppose just A fires. This will cause just ${C_p}$ to fire. This causes ${B_p}$ to fire “downwards” in the next step. Since after neuroids fire, they are in a refractory period, ${C_p}$ will not cause ${A_p}$ to fire. Also, after A fires, the synapses from B to C double their strength. This is a total state called ${PREDICTING B}$. This allows for C to fire if only the predicted B fires.
• In the next step, ${B_p}$ will fire (because ${C_p}$ fired; Then, ${C_p}$ will enter a state called ${PASSIVE}$ where it will ignore firing from its parents (For example if some ${E_p}$ fires where E=PJOIN(C,D) by setting the strength of those synapses to 0
• If, later, the predicted item B fires, then C will fire and all the items of that PJOIN, namely A,B, and C, will revert to the state ${P-OPERATIONAL}$.
• Lastly, if C’s parents fire downwards from some ${E_p}$ where E = PJOIN(C,D), then the neuroids in ${C_p}$ will fire and a “search” will be propagated “downstream.”

### 4. Limitations of Valiant’s Model

Despite the successes of Valiant’s model in explaining how concepts can be linked and the impressive theoretical foundation Valiant has laid down for studying neural computation, the model comes with limitations. Although Valiant’s framework describes neuroids that are capable of local computation – which is necessary for any model of cognition, the biological plausibility of other properties of the model is questionable. I will highlight some concerns.

One main problem in the model is that synapses are capable of abruptly changing their strengths by as much as a factor of two or even decreasing their strength to zero in a single instance. In real neurons, the change of their strengths occurs gradually over a long period of time. This capability in the model allows the designer of the operations to create functionality that is far too complex for the brain to naturally implement.

### 5. High Level Description of a Novel Computational Model

Now that I have discussed the framework of Valiant’s neuroidal model with items and operations on items, as well as Papadimitriou’s additional operation PJOIN that incorporates prediction, I will go ahead and provide a high level description of a novel computational model that has been materializing for a little over a year now. Let me begin by describing the setting of this model and some of its motivations.

The brain seems to be a large random graph that is somehow “terraformed” by the environment it senses. To work properly, the brain should create a model of the world including its objects and the relationships between the objects in space and time. It will use this model to predict and respond to familiar and new objects that it experiences.

The first thing we should look at is the structure of the environment. Notice that the world is structured in a loosely hierarchical fashion. You have many complex objects emerging from simple building blocks. For example, chairs, trees, firewood, pencils, all share the same building block that they are made from wood. A school bus, the sun, a banana, all share the same building block that they are yellow. The next thing to notice is that not all arrangements of these building blocks are possible. If there are ${n}$ building blocks, then there are ${<<2^n}$ possible arrangements of these building blocks that we will likely experience. If all possible arrangements of these building blocks were equally likely, then the “terraforming” of the brain would be very uniform – we would not be able to make predictions about the world because there wouldn’t be any patterns we could anticipate. Imagine, for an instance that at every step of time, a completely new and different arrangement appeared in the world with equal probability of any arrangement appearing. In this chaotic universe, any life, or any order for that matter would not be able to sustain itself since finding a stable state is necessary to insure any reasonable expected lifespan.

The hierarchy of Concepts will have atoms: the raw data/characters fed as input. Imagine these are the set of letters A thru M. We will call these level 0 concepts. Level 1 concepts then, are small subsets of the level 0 concepts. Here are some level 1 concepts: X = A,B,C; Y = D,E,B; Z = A,E,F,G; Then, level 2 concepts are subsets of level 1 concepts: For example, P = X,Y,Z. This is a very simplified version, but in general there can be multiple levels with a large number of concepts at each level. The environment should provide this concept P to the model by expanding P to its atoms. Therefore, P would look like this: A,B,C,D,E,B,A,E,F,G. In general, the environment might be generated by randomly picking concepts at the highest level and expanding them out one after the other. In this environment, you will see atoms most often, level 1 patterns second most often, level 2 patters third most often, etc.

Now, let’s take a look at how the brain might learn. A rule of thumb in neuroscience is that “neurons that fire together wire together.” This leads to the natural question, “When would neurons fire together?” Neurons would fire together when you experience things at the same time. The more often you experience these things at around the same time, the more often these concepts will link. As an example, you experience the concept bark, leaves, and something that is tall at the same time rather often – every time you see a tree! The same principle applies to causal relationships. When you shower, whenever you see water, you might also see steam. Thus, it would be natural for you to draw a link between the concepts of water and that of steam. In fact, within this line of reasoning you can start to see the foundations of theory-building developing in the mind – from basic causality.

Naturally, we can conclude that the more often experience a concept, the “stronger” the links between neurons of that concept.. It follows, also, that the “stronger” the connections to sets of neurons, the “easier” it is for those neurons to be activated.

Next, I will claim something interesting: In general, after learning, simpler concepts will be easier to activate than more complex concepts. Complex concepts are composed of simpler concepts. Thus, for every simple concept, there will be many complex concepts that “overlap” in that simple concept. Take as an example the following three complex concepts: A school bus, the sun, a banana. All three of these share the simiple concept “yellow.” Whereas each of these three complex concepts was activated once, yellow was activated three times. Thus, the simpler concept was strengthened more than any of those more complex concepts. It follows that the simpler concept will be easier to activate after learning. This “overlapping” seems to be a great candidate for how abstraction occurs in the mind. If the only concept you ever experienced were “banana,” the concept “yellow” might never “pop out” as a concept in and of itself. But, as you are provided with a complex world with many different yellow things, this concept begins to “pop out.” In my model, the mind will, through “overlap,” gradually develop hierarchies of concepts, storing a statistical model of the hierarchical world it experiences.

It is important now for me to mention how these hierarchies could be stored in my model. A key concept I will introduce here is that of cycles in the brain. In the brain, each neuron feeds into on the order of 10,000 other neurons. A relatively random graph with degrees this high will have the property that each neuron is a part of a large number of cycles of various length. In fact, EEG scans pick up oscillatory behavior in the brain at various frequencies. These cycles seem to be a perfect contender to explain how a memory can stay activated in the brain for an extended period of time.

The following figure depicts how a concept, A, is represented in my model – as a set of neurons that stay activated due to cyclic activity. There are arrows going into the concept which represents some consistent feed of stimulus fueling the concept. Some of these signals loop inside the concept keeping it activated or possibly amplifying it to some degree. There is also signal constantly radiating outwards to other neurons that could represent other concepts.

In the following figure, I represent the concepts A, B, and C which are often experienced together. Therefore, the connections between them will be stronger than those between two usually unrelated concepts. Notice that I purposely made the connections between the concepts A, B and C less bold than, say, the connections inside just A or just B or just C. This is simply because you will experience A alone more often than you will experience A with B. Thus, the connections intra-A will be stronger than those from A to other concepts. Also notice that the weaker connections between A, B and C will themselves form cycles, albeit weaker cycles.In the following figure, I want to show that A,B,C composes the concept “X.” This concept, just like our initial concept for A has signals feeding in, cyclic activity inside to keep it active, as well as connections radiating outward towards other concepts.The next figure depicts the concept X more simply by reducing A,B,C simply to its cyclic activity. I did this to illustrate that concept X has a very similar structure to any of its constituent concepts. Thus, in my model, a very natural hierarchy of concepts could form.There are many nice attributes about my model. Here I will describe two nice cognitive functions that emerge from it. The first is prediction, the second is association. Let’s look at the representation of X = ABC. Due to cyclic activity, a concept can stay “active” for some length of time before it “dissipates.” Therefore, A, B and C do not have to be activated simultaneously in order for the full concept X to be activated. Further, if just A is being activated, there will be a “flow” of signal from A to other concepts it is often associated with. Therefore, this flow from A will “predict” B and C making it more likely for them to be activated and thus causing the concept X to activate. Also, this model describes free association very naturally. Imagine you see something that is yellow, has wheels, and there are children on it. These abstract concepts lighting up might cause the complex concept “school bus” to activate. This concept will “radiate signal” out towards other concepts associated with “school bus.” Naturally, the next thing you might think is that school is in session and perhaps even think of the test your cousin has in school that he was telling you about.

This model is very attractive and I am currently constructing simulations to see if I can make this kind of behavior emerge from a random graph with some plausible learning rules and some specified hierarchy of concepts. Although I don’t think Valiant’s model is entirely plausible, their theoretical approach is very nice and it would be interesting to see how my model and the way it operates could be realized in a similar framework to Valiant’s.

### 6. References

• Christos Papadimitriou, Santosh Vempala. Cortical Learning Via Prediction. JMLR: Workshop and Conference Proceedings vol 40:1-21, 2015
• Leslie G. Valiant. Circuits of the mind. Oxford University Press, 1994. ISBN 978-0-19-508926-4

BY: Charles Shvartsman