Talking about **motivation and attention** is an excellent idea at 8.30 am on a Saturday.

Today’s talks started with a Westworld spoiler. Which I’m not too happy about, I wanted to watch it.

The first talk I have notes for, by Grace Lindsay, was introduced by a neat definition of attention: an internal state change that leads to a change in sensory processing. In particular, it implies enhanced performance in certain task (behavioural correlate of attention). Attention to preferred stimuli increases firing rates, while attention to null stimuli decreases them. The rest of the talk is about a model, the Stabilised Supralinear Network: a model of visual cortex with recurrently connected inhibitory and excitatory neurons. This reproduces a lot of neural correlates of attention; it lacks its behavioural correlate. The speaker’s work is such a network, combined with a CNN: the hybrid network is trained on MNIST (handwritten digits dataset). This model replicates performance correlates of attention, being able to tune its attention to one digit or another and consequently tell them apart when superimposed.

The session about **statistical approaches** involved as many as three talks about high dimensional recorded activity and dimensionality reduction. Can I conclude that it is a hot topic?

Jonathan Pillow starts by criticising David Marr’s idea that we should think of computational, algorithmic and mechanistic levels, and the first is the one that has supremacy. The work he presented was on “targeted dimensionality reduction”: let’s work in a space of firing rates for a large number of neurons. But we don’t do PCA: instead, we look for the dimensions that give us the most information about something outside this space, like the stimulus, or the choice (of motion direction). The dimensions are found by a simple regression problem.

The next is by Alex Williams. We can now record activity at very different timescales on the same animal, with thousands of neurons, in thousands of trials. PCA and similar don’t separate well fast, within-trial dynamics, from slow, across-trial components. A common approach is trial-averaged PCA: activity is N matrices for N trials, every matrix a neurons/time raster. The basic technique is to do PCA on these matrices, thereby classifying these PSTHs. But what they do instead is consider all these matrices as elements of a 3-tensor (trials x neurons x time), and use tensor decomposition methods. Dimensionality reduction is done by approximating this tensor by a sum of rank-1 tensors (CP decomposition). Interestingly, this technique corresponds, mathematically, to building a linear network with gain modulation (i.e. changes across trials).

The **circuits and behaviour** session: I have only a few sparse notes. One interesting experimental talk was by Matthew Lovett-Barron (I *think*), about alertness and different neuromodulators in various areas of the rodent brain. Alertness in rodents can be measured by behaviour and pupil dilation; in zebrafish, reaction times can be measured, and so the heart rate. During behaviour, zebrafish neurons can be recorded on the whole brain; however, we are interested in the ones that carry particular neuromodulators. These are identified by cell types by staining after the recording is concluded and the tissues are fixated. Imaging during the experiment and post-fixation are then matched and compared, allowing for the recording of 22 neuromodulatory populations. We can then find the ones correlated to alertness. Remarkably, these are the same in zebrafish and mice. The results were verified by optogenetic stimulation.

Andrea Hasenstaub tells “a cautionary tale”: she points out inconsistencies between manipulation experiments in the cases where neurons are selectively activated or deactivated. I was chatting with someone while presenting my poster later in the day, and he was telling me how much he appreciated the idea of raising warning flags, stopping and considering wheter the methods commonly used in a field make sense. We probably need more of this.

Finally, Daniel Wood: receptive fields are typically thought to be fixed with respect to the eye. However, there is a predictive remapping known to happen during saccades. This also depends on saccadic strategy: some saccades are designed to explore an area, other aim to focus attention on a location, so that the function of the saccade changes the way we perceive things.

On **Sunday** I was mostly interested in the **learning in networks** session. There was already quite a bit of talking about learning rules in the brain as opposed to artificial neural networks. Ali Alemi explained this well, even if specifically for what interested his research. The objective: how to learn GLOBALLY, in a supervised way, (input -> desired dynamics), but with LOCAL plasticity in a brain-like way. The problem is of course the “credit assignment problem”, i.e. the neurons don’t have access to the global error of the dynamics compared to the desired one. This has been done for linear dynamics. In their model, they have an arbitrary dynamics teacher and a student. Learning is done by some sort of adaptation; the student is replaced by a spiking network. Quite a few technical details are presented, but hard to understand on a presentation. They show how a bistable attractor can be taught (200 neurons). But we can do something more cool. We can learn how to walk in a simulated environment where the 50-neurons network controls a 3d puppet. So it’s learning *universal* computation in spiking networks with *local* learning rules. It’s efficient (as few spikes as possible) and consistent with experimentally observed spiking statistics and excitation/inhibition balance.

Guangyu Robert Yang began his talk like this: *prefrontal cortex is important*. Its interesting property is that it has many different functions, implemented in a single network. How do you perform all these tasks with a single network? He talked about this as a motivation for research in recurrent neural networks which are trained to work in a lot of different tasks. Two possibilities: you could have many (perhaps, interacting) subnetworks which are each specialised for a task; or you could have the same units working on all tasks. This is measured with “task-variance”: the activity variance in each task for a neuron, then normalised across tasks. Result: there are some units which are extremely specialised, varying in a single task only; others that have similar activity patterns for all tasks. The comparison can be done also in a pairwise way, comparing two tasks only with each other. There are then various ways in which the task may *interact* with each other.

Last but not least, the day (and the main Cosyne meeting), ended with a great talk by **Yoshua Bengio**, on something that I find really important, but the average neuroscientist I talked to seems to consider irrelevant or somehow unrelated to brain research. However, I’ll add this to a post specifically about the topic.