Man, this is why you are consistently in the same tier as Eliezer and Scott in my mind. Cleary naming an important phenomena while being compassionate and clear-headed about multiple perspectives on it in a way that makes me immediately want to tell all my friends.
At work, where we predict which ecommerce transactions are fraudulent and have financial incentive to be correct, we spent years being bitten by large fraud attacks where each individual transaction looked a bit suspicious but was not bad enough to block, and no assets tied all the transactions together in any obvious way. Eventually (finally) we began assessing the system as a whole and asking whether there was an elevated volume of just-under-the-threshold transactions in some segment of traffic. When there was, we lowered the threshold. For everyone. For a time.
And the reason we let fraudsters get away with it for so long is in large part because all of our systems were set up to assess One Single Transaction, plus other transactions concretely tied to it but explicitly not in any way that could set up a dangerous feedback loop, so none of our systems were set up to recognize the obvious-in-retrospect threshold attacks.
It's far simpler to narrow the scope of the problem to "assess this instance", and then your data model doesn't have a natural place to include global information and it doesn't fit through any of your nice interfaces. And you miss gigantic attacks on the system through myopia.
All that is to say (1) this for sure happens in ... "non-social"? contexts too, and (2) it can happen even if the thresholder isn't actually trying to be a thresholder. Though obviously a thresholder with *knowledge* of the rules can be a lot more efficient about it.
> explicitly not in any way that could set up a dangerous feedback loop
I want to emphasize this. Many "obvious" interventions like "if you see a 2.9, add 0.5 to all subsequent observations!" have a severe feedback loop problem. Observing 2.9, 2.4, 1.9, 1.4, 0.9, 0.4, 0.1, 0.1, 0.1 should definitely not result in sanctions. It's not hard to avoid doing this unless you *never* check to see if your "obvious" intervention does this.
This reminds me of the LW post on "sum-threshold attacks" [1], where someone stays below the plausible deniability threshold on many fronts so the sum/total harm is high.
A common tactic I've seen used by predators is to decrease the "2.9 half life" by operating in multiple mutually exclusive communities. Then when they are finally outed for crossing the line, people get surprised by how many credible harassment accusations piggyback off it.
The way I've come to think about it is the more 2.9s someone pulls, the bigger my Bayes update that they're a sociopath. Past a certain point I do everything I can to ostracise them (like when Aella talks about treating frame controllers with conflict theory [2]).
I think it's very common to extend too many second chances to threshold-ers because of typical mind fallacy. Conditional on that much plausibly-deniable abuse I claim they're not just a "confused" version of you, they have antisocial personality disorder. I don't know anybody else in my circles who agrees with me on this; but I also don't know anybody else who spotted individuals X and Y were predators.
Another great example of this kind of conduct is crypto moguls who flit between jurisdictions to avoid the focused attention of the law. Patrick McKenzie has a brilliant explanation of how crypto firms managed to obviously violate securities and anti-money-laundering law for so long [3].
Very insightful post. I have noticed this phenomena with increasing frequency as a lot of groups/organizations/collaborations/etc. have adopted "codes of conduct" in recent years. The bad actors quickly learn how to stay just below the punishable threshold, which weakens the code of conduct overall, since it becomes clear to everyone that there is rarely any enforcement against violations.
It seems like what we really need here is a continuous relaxation of a discrete system, but this is blocked by the large constant term in any judicial proceeding. "Treat the fourth 2.9 as a 6" might be the only practical approach to resolve this tension, but I wonder if we could come up with something better. (Maybe LLMs can reduce the constant term enough... an arbitration that only costs some number of FLOPs enables a much broader range of possibilities.)
i thought this was great, and i have (already!) found it quite useful as a handle. i really appreciate this writeup.
(2)
i wrote a brief summary for myself, but i figured i might as well share it here. i don't think it's _super_ high fidelity, but probably sufficient that saul_one-year-from-now will retain the important bits:
<begin summary>
"thresholding" is a category of behavior where a malicious actor engages in behavior that's *juuust* under a punishable level, many times, over & over, adds confusion & ambiguity into the mix, and makes actually punishing their behavior much more difficult. this also has the result of a community losing faith/trust in the system/rules.
duncan proposes a few solutions/mitigations:
1) use the term. encourage others to use the term.
2) reduce the stigma of keeping track of near-violations. (better yet, consider that record-keeping virtuous.)
3) follow through with the consequences you've previously threatened. responding to threshold attacks can sometimes induce lots of (fairly reasonable) reactions of "wtf? that wasn't even that bad?!" — be prepared to respond to them.
4ish) you can add extra rules/consequences/subsystems/etc to your community's system to make it more robust to threshold attacks. e.g., "multiple near-violations will lower your threshold for 3 months" or "your 5th near-violation will be punished quite strongly."
<end summary>
(3)
i don't have a good sense of your orientation to posting on LW, but — i imagine LW would quite like this. how would you feel about cross-posting this to LW? if you're not interested, would you mind if i did (as a linkpost)?
This essay seems uncharacteristically biased towards a certain perspective. There's a lot of talk about hostile actors and how to respond to thresholding as an authority figure trying to maintain order, rather than understanding the problem in a broader sense. Thresholding seems to be the natural consequence of forcing people to obey rules they don't agree with. It's only a problem from the perspective of people who think the rules are good and want them to be followed for the same reason as they were originally intended.
From the perspective of the thresholder, it's more "You're trying to paint me as the bad guy, even though I went out of my way to follow your stupid rules? I didn't have to do that, you know.". And, you know, that is a completely valid way to look at certain rulesets. If you live in a country with oppressive laws, or even just a friend group with too many sticklers and pedants, it's a good thing to refuse to engage more than strictly necessary. And what if there isn't one authority who gets the final say, and there's disagreement over where the line is?
I understand that's not the point of the essay, of course. It's designed to be useful to a particular type of community organizer dealing with a particular type of annoyance. I just wanted to explore how it applies in other contexts too.
I disagree that is a completely valid way to look at certain rulesets. I think you are missing something about ... enculturation? Assimilation? ... the example of a nation with oppressive laws is somewhat valid, but the essay is clearly focused on the damage that thresholders do to SUBcultures. The subculture was there first; the thresholder's perspective is fundamentally less valid in context; relativism is not appropriate.
I think that sometimes people enter a subculture adversarially, or decide once inside of it that it should be destroyed. But that's a different sort of question altogether—in most cases, you shouldn't undermine and ruin e.g. a martial arts academy by joining it and then gaming the rules, and someone who IS doing that (regardless of whether it's malice or obliviousness that is sufficiently advanced so as to be indistinguishable from malice) needs to go.
I definitely am missing something about enculturation or assimilation. I can't really understand what it is that matters here. The closest fit I have is, essentially, calling dibs. I doubt that's what you're talking about, though, so I think there's just something fundamentally missing in my map of the world.
Like, I guess if you have something so specifically fragile that a single individual can ruin the experience for everyone while still technically operating under the rules, you need to be careful about who you let in. You've talked about that in regards to greyspaces, simply having entry requirements could solve a whole host of issues with most subcultures. But, also, couldn't you just change the rules if something is that fragile? It might be hard to change rules on a societal level, but for a subculture you just get the boss to tweak the wording a bit, and then there's no more confusion once it's happened once.
The larger issue I have, I guess, is that it seems like every subculture I know of will already ban people for stepping too close to the line, and sometimes for no reason at all. Someone who is being rude to an authority figure can easily be kicked out of any institution, even if they don't have politeness as one of their rules. If you act suspiciously, people might ban you even if you never came close to any of the rules. When the people making the rules already have that power, it's weird to tell them to be even more careful about hostile actors who are pissing them off all the time.
I think having the thresholding concept *helps* with that, though ... like, the people who are overly sensitive or overly wary and liable to ban somebody for stepping too close to the line, if you can get common knowledge about thresholding they can often *chill out* and be like, ah, okay, yeah, so, there's this thing where we actually need you to avoid close calls in this particular way ...
(As opposed to preemptively writing off the person who was ambiguously either a thresholder or just autistic)
When there's ambiguity about soft and somewhat arbitrary borders, it's hard to communicate the actual desired behavior, which is "don't cross this line but also don't dance up close to it on the regular." Having the shared concept helps.
I think I understand you better, now. I still can't really grok the conceptual goals of subcultures, though. I think it's just some fundamental biological disconnect I have individually. I've never in my life actually been a part of any subculture, and I could only loosely be considered engaged with any culture at all. I understand there's a lot of valuable downstream effects of joining a group, such as being less bored when there's scheduled events and stuff, but it never seemed worth the effort to actually engage with the rules in the way the authorities wanted me to.
I don't know if this is relevant, but in case you need a conclusion to this discussion I'll try to explain. All the things you said about subculture cohesion and rules made me instinctively throw up shields. It was like being back at school, or family reunions. It feels like a situation where a cartoon villain says "Haha, once everyone joins this organization I have formed, they will all fall in line and assimilate into the one correct way of thinking about this subculture" and then the hero says something about freedom and adapting with the times and blows everything up while dramatic music swells. In more objective terms, I took an MTG color wheel test and scored basically 0 in White and Green, if that helps.
Man, this is why you are consistently in the same tier as Eliezer and Scott in my mind. Cleary naming an important phenomena while being compassionate and clear-headed about multiple perspectives on it in a way that makes me immediately want to tell all my friends.
At work, where we predict which ecommerce transactions are fraudulent and have financial incentive to be correct, we spent years being bitten by large fraud attacks where each individual transaction looked a bit suspicious but was not bad enough to block, and no assets tied all the transactions together in any obvious way. Eventually (finally) we began assessing the system as a whole and asking whether there was an elevated volume of just-under-the-threshold transactions in some segment of traffic. When there was, we lowered the threshold. For everyone. For a time.
And the reason we let fraudsters get away with it for so long is in large part because all of our systems were set up to assess One Single Transaction, plus other transactions concretely tied to it but explicitly not in any way that could set up a dangerous feedback loop, so none of our systems were set up to recognize the obvious-in-retrospect threshold attacks.
It's far simpler to narrow the scope of the problem to "assess this instance", and then your data model doesn't have a natural place to include global information and it doesn't fit through any of your nice interfaces. And you miss gigantic attacks on the system through myopia.
All that is to say (1) this for sure happens in ... "non-social"? contexts too, and (2) it can happen even if the thresholder isn't actually trying to be a thresholder. Though obviously a thresholder with *knowledge* of the rules can be a lot more efficient about it.
> explicitly not in any way that could set up a dangerous feedback loop
I want to emphasize this. Many "obvious" interventions like "if you see a 2.9, add 0.5 to all subsequent observations!" have a severe feedback loop problem. Observing 2.9, 2.4, 1.9, 1.4, 0.9, 0.4, 0.1, 0.1, 0.1 should definitely not result in sanctions. It's not hard to avoid doing this unless you *never* check to see if your "obvious" intervention does this.
This reminds me of the LW post on "sum-threshold attacks" [1], where someone stays below the plausible deniability threshold on many fronts so the sum/total harm is high.
A common tactic I've seen used by predators is to decrease the "2.9 half life" by operating in multiple mutually exclusive communities. Then when they are finally outed for crossing the line, people get surprised by how many credible harassment accusations piggyback off it.
The way I've come to think about it is the more 2.9s someone pulls, the bigger my Bayes update that they're a sociopath. Past a certain point I do everything I can to ostracise them (like when Aella talks about treating frame controllers with conflict theory [2]).
I think it's very common to extend too many second chances to threshold-ers because of typical mind fallacy. Conditional on that much plausibly-deniable abuse I claim they're not just a "confused" version of you, they have antisocial personality disorder. I don't know anybody else in my circles who agrees with me on this; but I also don't know anybody else who spotted individuals X and Y were predators.
Another great example of this kind of conduct is crypto moguls who flit between jurisdictions to avoid the focused attention of the law. Patrick McKenzie has a brilliant explanation of how crypto firms managed to obviously violate securities and anti-money-laundering law for so long [3].
[1]: https://www.lesswrong.com/posts/R3eDrDoX8LisKgGZe/sum-threshold-attacks
[2]: https://aella.substack.com/p/frame-control
[3]: https://www.bitsaboutmoney.com/archive/bond-villain-compliance-strategy/
Very insightful post. I have noticed this phenomena with increasing frequency as a lot of groups/organizations/collaborations/etc. have adopted "codes of conduct" in recent years. The bad actors quickly learn how to stay just below the punishable threshold, which weakens the code of conduct overall, since it becomes clear to everyone that there is rarely any enforcement against violations.
It seems like what we really need here is a continuous relaxation of a discrete system, but this is blocked by the large constant term in any judicial proceeding. "Treat the fourth 2.9 as a 6" might be the only practical approach to resolve this tension, but I wonder if we could come up with something better. (Maybe LLMs can reduce the constant term enough... an arbitration that only costs some number of FLOPs enables a much broader range of possibilities.)
(1)
i thought this was great, and i have (already!) found it quite useful as a handle. i really appreciate this writeup.
(2)
i wrote a brief summary for myself, but i figured i might as well share it here. i don't think it's _super_ high fidelity, but probably sufficient that saul_one-year-from-now will retain the important bits:
<begin summary>
"thresholding" is a category of behavior where a malicious actor engages in behavior that's *juuust* under a punishable level, many times, over & over, adds confusion & ambiguity into the mix, and makes actually punishing their behavior much more difficult. this also has the result of a community losing faith/trust in the system/rules.
duncan proposes a few solutions/mitigations:
1) use the term. encourage others to use the term.
2) reduce the stigma of keeping track of near-violations. (better yet, consider that record-keeping virtuous.)
3) follow through with the consequences you've previously threatened. responding to threshold attacks can sometimes induce lots of (fairly reasonable) reactions of "wtf? that wasn't even that bad?!" — be prepared to respond to them.
4ish) you can add extra rules/consequences/subsystems/etc to your community's system to make it more robust to threshold attacks. e.g., "multiple near-violations will lower your threshold for 3 months" or "your 5th near-violation will be punished quite strongly."
<end summary>
(3)
i don't have a good sense of your orientation to posting on LW, but — i imagine LW would quite like this. how would you feel about cross-posting this to LW? if you're not interested, would you mind if i did (as a linkpost)?
I'm not crossposting to LW myself but you're definitely welcome to. =)
This essay seems uncharacteristically biased towards a certain perspective. There's a lot of talk about hostile actors and how to respond to thresholding as an authority figure trying to maintain order, rather than understanding the problem in a broader sense. Thresholding seems to be the natural consequence of forcing people to obey rules they don't agree with. It's only a problem from the perspective of people who think the rules are good and want them to be followed for the same reason as they were originally intended.
From the perspective of the thresholder, it's more "You're trying to paint me as the bad guy, even though I went out of my way to follow your stupid rules? I didn't have to do that, you know.". And, you know, that is a completely valid way to look at certain rulesets. If you live in a country with oppressive laws, or even just a friend group with too many sticklers and pedants, it's a good thing to refuse to engage more than strictly necessary. And what if there isn't one authority who gets the final say, and there's disagreement over where the line is?
I understand that's not the point of the essay, of course. It's designed to be useful to a particular type of community organizer dealing with a particular type of annoyance. I just wanted to explore how it applies in other contexts too.
I disagree that is a completely valid way to look at certain rulesets. I think you are missing something about ... enculturation? Assimilation? ... the example of a nation with oppressive laws is somewhat valid, but the essay is clearly focused on the damage that thresholders do to SUBcultures. The subculture was there first; the thresholder's perspective is fundamentally less valid in context; relativism is not appropriate.
I think that sometimes people enter a subculture adversarially, or decide once inside of it that it should be destroyed. But that's a different sort of question altogether—in most cases, you shouldn't undermine and ruin e.g. a martial arts academy by joining it and then gaming the rules, and someone who IS doing that (regardless of whether it's malice or obliviousness that is sufficiently advanced so as to be indistinguishable from malice) needs to go.
I definitely am missing something about enculturation or assimilation. I can't really understand what it is that matters here. The closest fit I have is, essentially, calling dibs. I doubt that's what you're talking about, though, so I think there's just something fundamentally missing in my map of the world.
Like, I guess if you have something so specifically fragile that a single individual can ruin the experience for everyone while still technically operating under the rules, you need to be careful about who you let in. You've talked about that in regards to greyspaces, simply having entry requirements could solve a whole host of issues with most subcultures. But, also, couldn't you just change the rules if something is that fragile? It might be hard to change rules on a societal level, but for a subculture you just get the boss to tweak the wording a bit, and then there's no more confusion once it's happened once.
The larger issue I have, I guess, is that it seems like every subculture I know of will already ban people for stepping too close to the line, and sometimes for no reason at all. Someone who is being rude to an authority figure can easily be kicked out of any institution, even if they don't have politeness as one of their rules. If you act suspiciously, people might ban you even if you never came close to any of the rules. When the people making the rules already have that power, it's weird to tell them to be even more careful about hostile actors who are pissing them off all the time.
I think having the thresholding concept *helps* with that, though ... like, the people who are overly sensitive or overly wary and liable to ban somebody for stepping too close to the line, if you can get common knowledge about thresholding they can often *chill out* and be like, ah, okay, yeah, so, there's this thing where we actually need you to avoid close calls in this particular way ...
(As opposed to preemptively writing off the person who was ambiguously either a thresholder or just autistic)
When there's ambiguity about soft and somewhat arbitrary borders, it's hard to communicate the actual desired behavior, which is "don't cross this line but also don't dance up close to it on the regular." Having the shared concept helps.
I think I understand you better, now. I still can't really grok the conceptual goals of subcultures, though. I think it's just some fundamental biological disconnect I have individually. I've never in my life actually been a part of any subculture, and I could only loosely be considered engaged with any culture at all. I understand there's a lot of valuable downstream effects of joining a group, such as being less bored when there's scheduled events and stuff, but it never seemed worth the effort to actually engage with the rules in the way the authorities wanted me to.
I don't know if this is relevant, but in case you need a conclusion to this discussion I'll try to explain. All the things you said about subculture cohesion and rules made me instinctively throw up shields. It was like being back at school, or family reunions. It feels like a situation where a cartoon villain says "Haha, once everyone joins this organization I have formed, they will all fall in line and assimilate into the one correct way of thinking about this subculture" and then the hero says something about freedom and adapting with the times and blows everything up while dramatic music swells. In more objective terms, I took an MTG color wheel test and scored basically 0 in White and Green, if that helps.