Count unique voices, not repeated templates.
Campaign review has two risks: missing coordinated submissions that have been reworded, or wrongly grouping genuine residents because they share the same concern. Communiti now tests both sides directly, because the public record needs to protect both integrity and individual voice.
Coordinated
Same campaign skeleton
Names, streets, and personal details change, but the ordered argument structure repeats across submissions.
Count once as a campaign voice.
Genuine residents
Same topic, different voice
Residents can independently raise the same concern without sharing a template, campaign script, or coordinated structure.
Keep separate as individual voices.
What shipped
Coordinated templates are found without silencing organic voices.
Too weak
Exact duplicates miss reworded campaigns
Mail-merge fields, paraphrases, and own-words variants break a hash or exact-match workflow.
Too blunt
Shared topics are not proof of coordination
Independent residents can raise height, traffic, housing, and safety concerns without sharing a template.
What changed
Unique voice counts become reviewable
Campaign clusters, protected near-misses, and the headline voice count can all be checked against source submissions.
Tested before release
Benchmarked on campaign variants and genuine near-misses.
The benchmark includes verbatim copies, mail-merge variants, paraphrased talking points, heavy own-words rephrases, and organic responses designed to look similar without being coordinated. Last run: .
- 0.0%
- precision finding same-campaign relationships
- 0.0%
- recall across verbatim, mail-merge, paraphrase, and heavy rephrase campaigns
- 0
- organic resident pairs wrongly grouped
- 0.0%
- unique-voice count error: 45 predicted vs 47 audited gold voices
Watch the failure mode
The hard part is separating the skeleton from the topic.
Consultation teams need to know when a submission wave is coordinated, but the dangerous mistake is the false positive: grouping genuine residents because they used similar arguments. The public record needs both integrity and individual voice. This benchmark tests both sides at once: copied form letters, personalised templates, paraphrased talking points, heavy own-words rephrases, and near-miss organic responses that raise the same issues independently.
Synthetic benchmark examples
The hard part is separating the skeleton from the topic.
Exact duplicate tools are too weak. Topic buckets are too blunt. The benchmark asks whether repeated campaign structure can be found while residents with similar concerns remain separate.
Coordinated submissions
Same skeleton, personalised wording
sub_013 Wardell RoadAs a resident of Wardell Road for 14 years, I strongly object to the rezoning. Eight storeys is out of character, the traffic study ignores daily reality, and the affordable housing promise is not secured.
sub_020 Station LaneAs a resident of Station Lane for 27 years, I strongly object to the rezoning. Eight storeys is completely out of character, the traffic study ignores what we experience, and the housing promise is not secured.
Communiti records
10 template variants counted as one campaign voice
The names, streets, and household details vary, but the ordered argument structure is the same.
Genuine residents
Same topic, different voice
sub_041 Organic near-miss8 storeys towering over federation cottages is jarring, no way around it. And I do not believe the traffic modelling for a second - Station St is parked out by 5:30 every day.
sub_042 Organic near-missMy worry is the affordable housing clause. A letter of offer is not worth the paper. If it were in a planning agreement I might feel differently, but as exhibited, no.
Communiti protects
0 same-theme near-miss pairs merged
These residents discuss height, traffic, and affordable housing too, but they do not share a campaign skeleton.
Detection quality
97.7%
Same-campaign pair F1 on the page-grade suite.
Organic protection
0 / 903
Organic resident pairs wrongly grouped.
Voice headline
45 / 47
Predicted voices versus audited gold voices.
The corpus contains 79 synthetic submissions: 43 organic responses, 36 campaign instances, 12 thematic near-misses, and four campaign styles from verbatim copies through heavy own-words rephrasing.
Results
The benchmark scores both detection and protection.
96.8% of campaign relationships found
154 audited same-campaign pairs across four campaign styles
Exact matching catches copied form letters, then misses mail-merge, paraphrase, and own-words campaigns that still came from the same organised skeleton.
100% of organic pairs kept separate
903 organic-organic pairs, including same-theme near-misses
Topic similarity is not proof of coordination. The shortcut wrongly grouped 168 organic-organic pairs because genuine residents can share concerns without sharing a template.
95.7% unique-voice count accuracy
45 voices predicted against 47 audited gold voices
The headline number stayed defensible: 45 predicted voices vs 47 gold on a corpus built to hide campaigns among genuine residents.
The duplicate finder collapses after exact copies
Pair recall by campaign evasiveness
The publication gate for the hardest heavy-rephrase campaign was 60% recall. Communiti cleared it while preserving organic voices.
Why it matters
The public trust risk is counting campaigns incorrectly in either direction.
For community members
Shared concern does not mean fake voice
Residents who independently raise the same issue are kept separate instead of being collapsed into a campaign.
For ELT and elected leaders
The headline voice count is defensible
Decision-makers can see coordinated activity without overstating or understating genuine community participation.
For governance teams
Exceptions can be reviewed
Clusters, near-misses, and unique-voice counts remain tied to submissions and benchmark evidence.
At a glance
What changes when campaigns are measured directly
| Capability | Manual review Analyst + spreadsheet | Shortcuts Duplicates or shared topics | Communiti Intelligence Campaign detection |
|---|---|---|---|
| Exact form letters | Caught A reviewer or spreadsheet can spot exact copies. | Caught Exact duplicate detection works on copied text. | Caught Exact campaigns are clustered and counted once. |
| Mail-merge templates | Slow A reviewer has to notice the repeated structure across personalised submissions. | Missed Different streets and names break exact duplicate matching. | Clustered 100% pair recall on the mail-merge campaign. |
| Paraphrased talking points | Variable Reviewers may disagree about whether wording is coordinated or merely similar. | Missed Exact matching has no signal once the text changes. | Clustered 100% pair recall on the paraphrased campaign. |
| Organic near-misses | Possible A careful reader can protect genuine voices, but it is labour-intensive. | Merged Topic grouping treats shared concerns as coordination. | Protected 0 of 11 same-theme near-miss pairs were grouped. |
| Unique voice count | Manual The headline count depends on reviewer reconciliation. | Skewed Duplicate matching over-counted voices as 68; topic grouping under-counted them as 5. | Defensible 45 predicted voices against 47 audited gold voices. |
| Audit trail | By hand Clusters, notes, and exceptions have to be documented separately. | Thin A duplicate or topic bucket does not explain whether coordination was present. | Built in Campaign clusters can be reviewed against source submissions and the benchmark evidence pack. |
For technical reviewers
The scores behind the release.
The full scorecard
Page-grade suite on a synthetic campaign-detection corpus: 79 submissions, 36 campaign instances, 43 organic responses, and 47 audited unique voices.
| Metric | Communiti | Baseline or pass line |
|---|---|---|
| Same-campaign pair precision | 98.7% | 95.0% pass line |
| Same-campaign pair recall | 96.8% | 90.0% pass line |
| Same-campaign pair F1 | 97.7% | 60.0% exact duplicate finder |
| Organic resident pairs wrongly grouped The shortcut grouped responses by proposal topic, which is not evidence of coordination | 0 of 903 | 168 shared-topic shortcut |
| Same-theme near-miss pairs wrongly grouped | 0 of 11 | 11 shared-topic shortcut |
| Unique-voice count Exact duplicate detection over-counted by treating reworded campaign variants as separate voices | 45 predicted vs 47 gold | 68 exact duplicate finder |
| Unique-voice count error | 4.3% | 44.7% exact duplicate finder |
| Borderline-pair reliability | 93.4% on 61 pairs | 90.0% pass line |
| Heavy rephrase campaign recall | 66.7% | 0% exact duplicate finder |
Headline percentages are rounded for readability. The benchmark pack includes the synthetic corpus, gold labels, cached outputs, scoring code, charts, and dated evidence snapshot.
Methodology
How we measured
Test corpus
Synthetic consultation feedback only - no resident data - spanning 79 submissions, 36 campaign instances, 43 organic responses, 12 thematic near-misses, and 47 audited unique voices.
- Campaign instances
- 36
- Four campaign styles: 12 verbatim copies, 10 mail-merge variants, 8 paraphrases, and 6 heavy own-words rephrases
- Organic responses
- 43
- Independent submissions, including 12 thematic near-misses that raise the same issues as campaigns
- Gold unique voices
- 47
- Each organic response counts once; each campaign counts once regardless of submission count
Processing controls documented
The benchmark pack records processing locations and deployment options so technical reviewers can verify the operating environment.
Never used to train AI
Your community's feedback is not used to train any AI model, and the model provider has no access to it - contractually guaranteed by AWS.
Evidence on request
The benchmark pack includes synthetic data, gold labels, scoring code, cached outputs, charts, and methodology notes for technical review.
Every cluster reviewable
Campaign labels are not loose summaries. Clusters can be checked against source submissions, near-miss examples, and the unique-voice count.
The fine print we think you should read
- Test data. The benchmark uses synthetic consultation feedback written for testing. No resident data was used.
- Pair scoring. Same-campaign precision and recall are scored over audited submission pairs. A pair is correct only when both submissions are instances of the same seeded campaign.
- Workflow baselines. The exact duplicate finder and shared-topic shortcut are included because they reflect common ways teams approximate campaign review today. They are not product comparisons.
- Unique voices. The gold unique-voice count treats each organic response as one voice and each seeded campaign as one coordinated voice.
- Reproducibility. Every number on this page traces to a dated evidence snapshot, gold labels, cached outputs, scoring code, charts, and a rerunnable verification path.
Available now in Communiti
See campaign detection on your own submissions.
Bring one real, de-identified feedback export to a 30-minute walkthrough, or ask for the benchmark pack and have your technical team check the scoring path.
