Yes or no, true or false?
“If you combine negative reinforcement with positive reinforcement, you poison the learning process.”
Do you agree?
Do you poison the cue? The environment? Yourself? – if you after a negatively reinforced response add a positive reinforcer (aka combined reinforcement)?
Some of the readers of this blog may say “yes”.
Some may say “no”.
Some may say “huh?”
After all, that question doesn’t make sense if you’re unfamiliar with those terms. Stay tuned, I’ll explain them in a minute.
I think the right answer is “it depends”.
Before we start untangling the potential pitfalls of combined reinforcement – why is this an important question?
Helping horses and their people.
This question is important because cross-over trainers (I’m thinking horse people here) may find it hard to transition from negative reinforcement to positive reinforcement without using combined reinforcement.
In the horse training world, the paradigm shift from negative reinforcement-based training to positive reinforcement training is a huge leap for both the traditionally schooled trainer – and the horse.
I’ve been told that many horse people try making that shift, fail and give up – reverting back to the familiar, old-school, negative reinforcement-based training.
In my book, the failure to incorporate positive reinforcement in any training regime would lead to potential unnecessary suffering, and I can’t help thinking that a more gradual shift may perhaps be less difficult than an all-or-nothing approach in some cases. Maybe the difficulty in shifting from negative to positive reinforcement could be addressed by transitioning through combined reinforcement?
In my opinion, combined reinforcement has the potential of either increasing or decreasing suffering, depending on how it’s carried out – and what the alternatives are.
I may be completely off track here. After all, I’m not a horse trainer, I haven’t lived through that transition personally. So, this blog post is hypothetical – but I’m counting on you, dear reader, to let me know whether you agree with me – or not (and yes, there’s been disagreement, keep reading!).
But first, some definitions – just to make sure we’re on the same page.
Negative reinforcement – escape or avoidance
Negative reinforcement is when the animal learns to escape or avoid something unpleasant by performing a specific behaviour.
Escape or avoidance are technical terms in this context.
Let’s take a doggie example, just so that the dog people feel that reading on is relevant – after all, the same learning principles apply to all animal species, not just horses.
Say you’re walking your dog. He pulls ahead, and the leash gets taught, which is uncomfortable. So he stops pulling, which reduces the pressure on the leash. He just performed an escape response. Not ‘escape’ as in running for his life but as in performing a behaviour that terminates an ongoing unpleasant stimulus.
In other words, the dog who successfully performs an escape response has control over the end of the unpleasant stimulation. He has figured out how to end it.
Later he may learn what serves as a warning signal announcing that he’s about to experience the tightening of the leash – whether it’s a certain behaviour from the handler, or something that happens in the environment, such as seeing another dog. So he returns closer to the handler when he perceives the warning signal, and has thus avoided experiencing the increased leash pressure.
‘Avoidance’ is a technical term here. It means that the aversive stimulus is not applied. A horse who’s standing still as you’re saddling him is showing avoidance behaviour if he’s learned that shying away is met with unpleasantness. So, even though he’s just standing still, technically he’s showing avoidance behaviour.
So, for the animal, successful avoidance means controlling the start of the unpleasant stimulation. He has figured out how to avoid its onset.
This concept of control is extremely important when it comes to empowering animals and maintaining high welfare. Empowered animals are resilient; they aren’t as emotionally affected by aversive events. And vice versa, the welfare of animals who lack control over aversive events is potentially challenged.
It may sound as though I’m promoting negative reinforcement, as if it’s unproblematic. “It gives the animal control; the animal gets empowered and resilient”.
It’s not that simple.
Negative reinforcement is often carried out badly, and since it involves aversive stimuli, it has the same potential unwanted, unhealthy and dangerous side effects as punishment.
What dangerous side effects? I listed 20 problems with punishment here.
So, negative reinforcement potentially carries some serious fall-out, and there are more effective ways of getting behaviour, such as using positive reinforcement.
Positive reinforcement – the preferred approach.
Positive reinforcement is when the animal learns to get access to some type of resource, or something pleasant, by performing a specific behaviour. For instance, the dog gets treats, praise, petting or eye contact when he stays close to the handler. He then spends more time close to the handler to get continued access to those resources.
This approach to teaching and training animals is a lot less problematic than negative reinforcement, and generally results in more enthusiastic learners and better performance.
Also, positive reinforcement is generally more forgiving if you mess up as a trainer. Not to say that positive reinforcement training is without potential pitfalls, though.
The difficulty of transitioning
So which choices are there?
Let’s say you’re a traditionally schooled horse trainer, skilled at using negative reinforcement… would you wake up one day and drop those highly developed skills, and switch to something you’ve never tried before – positive reinforcement?
As far as I’ve understood, and to their immense credit, many horse people actually do try this. They leave familiar ground and try something new, perhaps without an experienced coach mentoring them – and then their horse gets overexcited and starts mugging them, and many of those brave trainers quit.
Not being a horse person, I didn’t realize this problem existed until I started hearing it from my students, but it’s a story that keeps coming back.
What is the problem, then? Why is transitioning so difficult?
One part of the issue may be that the would-be cross-over trainer is changing too many things at once.
When teaching using positive reinforcement, we often shape behaviour. We don’t expect behaviour to change in leaps, but rather in baby steps. We change it by increments, making small adjustments to existing behaviour.
We avoid setting the animal up to fail. Success should be easy, at every step of the way.
We avoid frustration.
And yet – we expect cross-over horse trainers to make that huge leap, without shaping the transition from negative to positive reinforcement!
Setting them up to fail, as it were.
What I’m suggesting is that we help them start where they’re at, and consider “teaching-positive-reinforcement-to-a-transitioning-horse-trainer” as a shaping procedure. It has the added benefit of being a non-confrontative approach that might make the trainer who’s still on the fence about trying positive reinforcement more likely to listen to and implement suggestions for change.
A rough shaping plan may look like this.
- For instance, if the trainer is using negative reinforcement, we first help them do it better. We teach them to read the subtle changes in body language indicating the animal’s discomfort. We help them improve their timing and the importance of applying as little pressure as possible. We teach them the distinction of escape and avoidance.
- If the trainer is using negative reinforcement skillfully (with such mild aversives that there’s no fear reaction), we introduce them to using combined reinforcement, adding a positive reinforcer to a negatively reinforced behaviour. For instance, if the horse yields to slight pressure without showing signs of discomfort, he also gets a treat. It’s important that the treat-giving serves to reduce any slight discomfort through counterconditioning, and doesn’t induce conflict by following a strong aversive that induces fear (as this leads to the risk of poisoning, see below).
- If the cross-over trainer has gained some skills using positive reinforcement, the use of negative reinforcement can be faded as that new toolbox grows. Any poisoned cues may be retrained using new cues.
This rough shaping outline may sound easy to implement, and I realize it’s not. I’m not a horse trainer so I wouldn’t know exactly how to go about doing it, either; which exercises to use to help teach the required skills and getting momentum through this shaping process?
Beyond the difficulty in identifying the right practical exercises, I see three main reasons why it might fail.
- Some would-be cross-over trainers feel guilty about having used negative reinforcement and want to stop immediately. They see no value in improving their use of this technique, and don’t want to spend any time on step 1 – or 2. But going straight to step 3 is simply too difficult.
- They may be unprepared that the previously well-mannered horse may get excited about food, conflicted and aroused, or start mugging. This makes some would-be cross-over horse trainers give up their attempts of using positive reinforcement altogether.
- They may have heard that combined reinforcement (negative and positive reinforcement) may poison the training process, and so they won’t try it. This might also make many would-be cross-over horse trainers give up, because step 2 in the shaping protocol is omitted, and going from step 1 to step 3 is, again, too difficult.
So, returning to the first question I asked: Is there some truth to the risk of poisoning?
Well. Sometimes yes, sometimes no.
Hence this blog post – because that “sometimes, no” would bring back step 2 into the shaping process, and ultimately help more horse people start using positive reinforcement.
The potential problem is that the cue asking the animal for the response may become poisoned (as may the environment, including the trainer). Once poisoned, the cue is both a warning that if the response isn’t shown, unpleasant things will happen, and a promise that once the correct response is shown, there’ll be goodies.
The poisoned cue is ambiguous, since what follows may sometimes, but not always, involve unpleasantness. And so the animal reacts emotionally (fearfully, showing avoidance) to the poisoned cue.
So how likely is a cue followed by combined reinforcement to become poisoned?
I’d say it depends on a couple of factors, and poisoning can perhaps be avoided by considering the following:
- How unpleasant is “unpleasant”? Does the animal show an emotional response to it indicating fear or discomfort? (ideally, no – the unpleasant stimulus should be very mild)
- Has the animal learned to escape and avoid the unpleasantness? (ideally, yes – he should have learned to control it)
- How good are the goodies? Does the animal get an emotional response to them? (ideally, yes – he should be enthusiastic about them).
I think cues get poisoned when:
- The unpleasant stimulus is painful and/or evokes a fear response.
- The animal hasn’t learned to escape or avoid the unpleasantness.
- The positive reinforcer isn’t potent enough to countercondition the unpleasantness.
One oft-discussed Master thesis examined the effects of combined reinforcement in comparison to solely positive reinforcement and found that the cue (and the environment) did indeed get poisoned. In that study, the unpleasant stimulus that poisoned the cue involved pulling the dog into the desired position by tugging the leash.
In the study, the dog was wearing a harness and was pulled into position if he didn’t respond to the cue. Later, he learned to avoid pulling by walking to the desired position on cue. In other words, he learned an avoidance response, but he never learned an escape response. Also, the tugging was aversive enough to lead to emotional responses.
And so in that study, the cue became poisoned, and there was a big difference between behaviours trained with combined reinforcement compared to behaviours trained solely with positive reinforcement.
What would have been the outcome if the intensity of the tugging wasn’t as aversive? If the animal learned to escape it – perform a behaviour to terminate it – learned to control the aversive?
Might we see counterconditioning rather than poisoning if the procedure were carried out differently? That’s question one.
Question two is – what’s the alternative? In that study, there was never a comparison with behaviours trained solely with negative reinforcement.
What are the options?
For many would-be cross-over horse trainers, the options aren’t to use to use combined reinforcement or positive reinforcement. Remember, the trainer who hasn’t transitioned isn’t skilled using positive reinforcement – yet.
The options for many traditional horse trainer would perhaps be to use make that transition using combined reinforcement – or revert to the old ways and use negative reinforcement.
So, the option for the would-be cross-over horse trainer isn’t between combined and positive reinforcement, it’s between combined and negative reinforcement. Again, combined reinforcement has the potential of either increasing or decreasing suffering, depending on how it’s carried out – and what the alternatives are.
If the alternative is positive reinforcement, the combined reinforcement may be the least preferred option.
If the alternative is negative reinforcement, then combined reinforcement may be the best choice, serving two different purposes.
One, teaching the cross-over trainer to incorporate positive reinforcement into the training, helping make the transition.
Two, counterconditioning the training situation so that it’s less aversive to the horse. After all, the horse will learn that uncomfortable situations are invariably followed by nice things happening – those discomforts become predictors of good stuff and therefore less scary.
Negative reinforcement has a bad rap, and many cross-over horse trainers abandon this training technique as soon as they can when they’ve successfully learned to use positive reinforcement.
That’s great. Once you’ve built the skills of using positive reinforcement, negative reinforcement may be faded out of most training situations.
But if you haven’t yet made that transition, or you’re struggling? Combined reinforcement may be the intermediate step that will help you successfully learn to incorporate positive reinforcement in your training.
Combined reinforcement may reduce the risk that you’ll give up.
theory clashing with practical perspectives… (revision!)
My perspective is mostly theoretical, but Maxine Easey’s (horse-charming.com) is practical, and she kindly provided some constructive critique to this blog post. She pointed out some reasons why my suggestion might be problematic, and the following is my interpretation of her main objections:
- The power of reinforcement history – with regards to the person. People will often continue doing what they’ve done for many years – they’ll often revert to restraint and correction, or less optimal ways of using negative reinforcement (such as using too strong aversives, or not releasing pressure timely) – it takes conscientious practice and self-awareness not to fall into that trap. In her opinion, it’s better to teach them something completely new (AKA positive reinforcement), and make sure they get it right from the start.
- Using light pressure might sound easy in theory, but it’s not in practice. In certain environments such as where there’s lots of competing motivators such as distractions – think grass! – or aversives, the animal might not respond to light pressure. Many trainers will then resort to higher-level aversives. In other words, despite the best intentions, those light aversives tend to become warning signals telling the horse that unless he complies, strong aversives will follow – so there’s a risk of emotional responses and fear learning. Extinction bursts for the trainer tie into this: if we do something and it doesn’t work, we do more of what we were doing. In other words, if we use light pressure and it doesn’t work, we escalate the pressure – and that escalation is negatively reinforced if it does work. Max told me that by promoting negative reinforcement, we set people up to escalate.
- She suggests training some behaviours using positive reinforcement in an optimal environment, and improve the person’s skills using positive reinforcement in those types of scenarios (ideally starting with just counterconditioning so that the horse’s perception of people will change to the point of eating and being relaxed in the presence of humans, rather than being conflicted).
- The big risk when combining reinforcement is poisoning: the ambiguity of the situation frustrating the horse and generating conflict. Although in theory (as outlined in this blog post) you might successfully countercondition the situation, in Max’s experience combined reinforcement more often leads to poisoning – and that’s the reason would-be cross-over horse trainers quit trying to use positive reinforcement.
… do you agree, horse people? Is counterconditioning a real possibility, or is the risk of poisoning too great? Have you found another way of shaping the transition that might help the would-be cross-over horse trainer who’s reading this? Please share with us in the comments’ section!
Did you find this blog post interesting? Perhaps, then, you’d want to get information about my online courses – I have one about how emotions impact behaviour, and two about animal training (one introductory, one advanced)? Sign up below to get notified of when they’re open for admission – and I’ll also notify you about future blog posts and free webinars!