Stance detection Communiti Intelligence · v1.0 · Last run 14 June 2026

Measure what people want, not just how they sound.

Communiti separates position from tone so support, opposition, acceptance, rejection, mixed feedback, and conditional support can be counted with evidence rather than inferred from mood.

Read the launch story behind this benchmark: Measure what people want, not just how they sound

0.0%: stance accuracy on the frozen test split
0.0%: macro-F1 across stance classes
0%: accuracy on tone-divergent feedback where mood and position split
0%: stable across 115 meaning-preserving wording changes

Proof point № 1

Tone tells you how people feel. Stance tells you what they want done.

That distinction is a reporting risk, not a technical nicety. A resident can support a plan angrily. Another can reject it politely. If mood becomes position, the public record can flip support into opposition or opposition into support. This benchmark tests those cases directly: angry support, polite opposition, conditional acceptance, mixed responses, multilingual feedback, and wording changes that should not change the result.

Synthetic benchmark examples

Two comments. Two tone traps. Opposite report outcomes.

These are the kinds of responses that make sentiment dashboards unsafe for consultation reporting. Tone points one way; the actual position points the other.

Reading comments

Example 1

Angry support

Negative mood, supportive position

Resident comment: I am fed up with how dangerous this road has become, so yes, build the protected cycleway. Just get on with it before someone is hurt.

Tone shortcut

Negative tone ->

Opposition

A tone shortcut moves this resident into the opposition count.

Communiti records

Actual position ->

Support

Evidence: “yes, build the protected cycleway”

Example 2

Polite opposition

Positive mood, opposing position

Resident comment: Thanks for the clear proposal and the work behind it. I still do not support removing the parking bays outside the shops.

Tone shortcut

Positive tone ->

Support

A tone shortcut moves this resident into the support count.

Communiti records

Actual position ->

Opposition

Evidence: “I still do not support removing the parking bays”

The benchmark contains 28 tone-divergent responses like these. Communiti scored 100% on that subset; the tone shortcut scored 25%.

100% when tone and position split

28 responses built to catch angry support and polite opposition

Communiti stance detection 100%

Tone shortcut 25%

If tone becomes position, reports can flip support into opposition or opposition into support.

98.7% on held-out cases

79 responses scored after development, with audited gold labels

Communiti stance detection 98.7%

Benchmark pass line 85%

The best production-style run cleared the target with 98.7% accuracy and 97.2% macro-F1 across support, opposition, acceptance, rejection, mixed, conditional, and neutral cases.

100% condition grounding

27 audited condition quotes, scored for recall and evidence grounding

Condition recall 100%

Condition grounding 100%

Conditional support is stored with the resident's actual condition so reviewers can check the decision.

Accuracy held as the cases got harder

Frozen benchmark tiers include clean feedback, harder wording, adversarial cases, and multilingual responses

Easy Clean, direct stance statements

100%

85%

Medium More context and mixed phrasing

100%

85%

Hard Adversarial and ambiguous wording

98%

85%

Multilingual Ten benchmarked community languages

100%

75%

The important finding is not that the easy cases worked. It is that the system stayed accurate on the hard, multilingual, and tone-divergent cases that usually distort consultation reporting.

Proof point № 2

Stance held across community languages

The benchmark includes ten community languages and mixed-language cases so non-English feedback is not treated as an afterthought. The best run scored 100% accuracy on the multilingual subset.

Mandarin 中文
Arabic العربية
Vietnamese Tiếng Việt
Cantonese 廣東話
Punjabi ਪੰਜਾਬੀ
Greek Ελληνικά
Italian Italiano
Hindi हिन्दी
Te Reo Māori
Samoan Gagana Sāmoa

These are the ten languages benchmarked in this run. Communiti supports more than 50 languages in production, with the same evidence-first review workflow.

At a glance

What changes when stance is measured directly

Comparison of common review shortcuts and Communiti stance detection on the cases that matter in consultation analysis.
Capability	Manual review Analyst + spreadsheet	Sentiment shortcut Tone treated as position	Communiti Intelligence Stance detection
Angry support	Caught A careful reader can separate frustration from the actual position.	Flipped Negative tone is treated as opposition.	Support Support is recorded, with the frustrated wording still available for review.
Polite opposition	Caught A reviewer can see the rejection if they read closely.	Flipped Positive tone is treated as support.	Opposition The position is separated from the politeness of the wording.
Conditional support	Slow The condition has to be copied into a report or tracking sheet by hand.	Flattened The response becomes positive or mixed, but the condition is not preserved.	Grounded The stance and condition are both captured, with the resident's words attached.
Mixed feedback	Possible Accurate when reviewers have enough time and apply the same rules.	Collapsed Multiple positions get reduced to a single mood label.	Separated Mixed stance is preserved instead of being forced into support or opposition.
Small wording changes	Variable Different reviewers may read borderline wording differently.	Brittle Tone words can change the label even when the position stays the same.	Stable 100% invariant across 115 meaning-preserving perturbation pairs.
Audit trail	By hand Review notes and quotes have to be maintained separately.	Thin A label with little evidence for why the position was assigned.	Built in Labels, conditions, confidence, and evidence can be traced back to the response.

For your technical reviewers

The scores behind the headlines

Headline figures are rounded for readability. These are the underlying benchmark results and technical context behind the public claims on this page.

The full scorecard

Production-style run on a 292-entry synthetic consultation corpus with a 79-entry frozen test split.

Metric	Communiti	Pass line or baseline
Frozen test stance accuracy	98.7%	85.0% pass line
Macro-F1 across stance classes	97.2%	80.0% pass line
Tone-divergent stance accuracy 28 entries where tone and stance deliberately diverge	100%	25.0% sentiment shortcut
Condition recall 27 audited condition quotes in the corpus	100%	75.0% pass line
Condition grounding	100%	90.0% pass line
Perturbation invariance 115 wording changes that preserve the underlying stance	100%	95.0% pass line
Temperature stability Identical labels across repeated t=0.0 and t=0.1 runs	100%	98.0% pass line
Agreement auto-accept Agreement between two independent production-style arms	99.6% accuracy at 97.3% coverage	99.0% accuracy at 85.0% coverage pass line
Selective prediction	98.0% accuracy at 96.2% coverage	Confidence threshold 0.85

Headline percentages are rounded for readability. The benchmark pack includes the synthetic corpus, gold decisions, raw outputs, scoring notebook, and cached verification path.

Methodology

How we measured

Test corpus

Synthetic consultation feedback only - no resident data - spanning 292 entries, a 79-entry frozen test split, 27 audited condition quotes, 28 tone-divergent responses, 115 perturbation pairs, and ten community languages.

Frozen test split: 79; Held-out responses scored after development, including easy, medium, hard, tone-divergent, and multilingual cases
Development split: 213; Synthetic consultation responses used to develop and stress the stance taxonomy
Perturbation pairs: 115; Meaning-preserving wording changes used to check label stability

Processed in Australia

Analysis runs on AWS in Sydney and Melbourne using Australia-geographic infrastructure. Feedback is not processed offshore.

Never used to train AI

Your community's feedback is not used to train any AI model, and the model provider has no access to it - contractually guaranteed by AWS.

Evidence on request

The benchmark pack includes synthetic data, gold decisions, scoring code, raw outputs, charts, and methodology notes for technical review.

Every condition traceable

Conditional feedback is not only labelled. The condition is grounded in the original response so reviewers can check the evidence behind the result.

The fine print we think you should read

Test data. The benchmark uses synthetic consultation feedback written for testing. No resident data was used. The corpus contains 292 entries, including a 79-entry frozen test split and a 213-entry development split.
Gold decisions. Gold labels and condition decisions were audited before scoring. The benchmark pack includes the gold-decision notes used to resolve ambiguous cases.
Sentiment shortcut. The sentiment-as-stance baseline is included because it is the most common analytical mistake in this task: mapping positive tone to support and negative tone to opposition. It is not a product comparison.
Reproducibility. Every number on this page traces to the run summary, raw outputs, and scoring notebook. The cached verification path re-scores existing outputs without making live model calls.

Check our work

See your own consultation benchmarked this way

Bring one real, de-identified feedback export to a 30-minute walkthrough - or request the benchmark pack and have your technical team verify every number on this page.

Book a demo Request the benchmark pack

Measure what people want, not just how they sound.

Tone tells you how people feel. Stance tells you what they want done.

Two comments. Two tone traps. Opposite report outcomes.

Angry support

Polite opposition

100% when tone and position split

98.7% on held-out cases

100% condition grounding

Accuracy held as the cases got harder

Stance held across community languages

What changes when stance is measured directly

The scores behind the headlines

The full scorecard

How we measured

Processed in Australia

Never used to train AI

Evidence on request

Every condition traceable

The fine print we think you should read

See your own consultation benchmarked this way

Ready to turn community feedback into defensible outcomes?

Stay close to the future of community engagement

Headline results

Tone tells you how people feel. Stance tells you what they want done.

Two comments. Two tone traps. Opposite report outcomes.

Angry support

Polite opposition

100% when tone and position split

98.7% on held-out cases

100% condition grounding

Accuracy held as the cases got harder

Stance held across community languages

What changes when stance is measured directly

The scores behind the headlines

The full scorecard

How we measured

Processed in Australia

Never used to train AI

Evidence on request

Every condition traceable

The fine print we think you should read

See your own consultation benchmarked this way

Ready to turn community feedback into defensible outcomes?

Stay close to the future of community engagement