Product Updates 14 June 2026 2 min read

New · Campaign detection benchmark

Count unique voices, not repeated templates.

Campaign review has two risks: missing coordinated submissions that have been reworded, or wrongly grouping genuine residents because they share the same concern. Communiti now tests both sides directly, because the public record needs to protect both integrity and individual voice.

The public-trust failure mode

Coordinated

Same campaign skeleton

Names, streets, and personal details change, but the ordered argument structure repeats across submissions.

Count once as a campaign voice.

Genuine residents

Same topic, different voice

Residents can independently raise the same concern without sharing a template, campaign script, or coordinated structure.

Keep separate as individual voices.

What shipped

Coordinated templates are found without silencing organic voices.

Too weak

Exact duplicates miss reworded campaigns

Mail-merge fields, paraphrases, and own-words variants break a hash or exact-match workflow.

Too blunt

Shared topics are not proof of coordination

Independent residents can raise height, traffic, housing, and safety concerns without sharing a template.

What changed

Unique voice counts become reviewable

Campaign clusters, protected near-misses, and the headline voice count can all be checked against source submissions.

Tested before release

Benchmarked on campaign variants and genuine near-misses.

The benchmark includes verbatim copies, mail-merge variants, paraphrased talking points, heavy own-words rephrases, and organic responses designed to look similar without being coordinated. Last run: 14 June 2026.

0.0%: precision finding same-campaign relationships
0.0%: recall across verbatim, mail-merge, paraphrase, and heavy rephrase campaigns
0: organic resident pairs wrongly grouped
0.0%: unique-voice count error: 45 predicted vs 47 audited gold voices

Watch the failure mode

The hard part is separating the skeleton from the topic.

Consultation teams need to know when a submission wave is coordinated, but the dangerous mistake is the false positive: grouping genuine residents because they used similar arguments. The public record needs both integrity and individual voice. This benchmark tests both sides at once: copied form letters, personalised templates, paraphrased talking points, heavy own-words rephrases, and near-miss organic responses that raise the same issues independently.

Synthetic benchmark examples

The hard part is separating the skeleton from the topic.

Exact duplicate tools are too weak. Topic buckets are too blunt. The benchmark asks whether repeated campaign structure can be found while residents with similar concerns remain separate.

Scanning submissions

Coordinated submissions

Same skeleton, personalised wording

Clustered

sub_013 Wardell Road
As a resident of Wardell Road for 14 years, I strongly object to the rezoning. Eight storeys is out of character, the traffic study ignores daily reality, and the affordable housing promise is not secured.

sub_020 Station Lane
As a resident of Station Lane for 27 years, I strongly object to the rezoning. Eight storeys is completely out of character, the traffic study ignores what we experience, and the housing promise is not secured.

Communiti records

10 template variants counted as one campaign voice

The names, streets, and household details vary, but the ordered argument structure is the same.

Genuine residents

Same topic, different voice

Kept separate

sub_041 Organic near-miss
8 storeys towering over federation cottages is jarring, no way around it. And I do not believe the traffic modelling for a second - Station St is parked out by 5:30 every day.

sub_042 Organic near-miss
My worry is the affordable housing clause. A letter of offer is not worth the paper. If it were in a planning agreement I might feel differently, but as exhibited, no.

Communiti protects

0 same-theme near-miss pairs merged

These residents discuss height, traffic, and affordable housing too, but they do not share a campaign skeleton.

Detection quality

97.7%

Same-campaign pair F1 on the page-grade suite.

Organic protection

0 / 903

Organic resident pairs wrongly grouped.

Voice headline

45 / 47

Predicted voices versus audited gold voices.

The corpus contains 79 synthetic submissions: 43 organic responses, 36 campaign instances, 12 thematic near-misses, and four campaign styles from verbatim copies through heavy own-words rephrasing.

Results

The benchmark scores both detection and protection.

96.8% of campaign relationships found

154 audited same-campaign pairs across four campaign styles

Communiti campaign detection 96.8%

Exact duplicate finder 42.9%

Exact matching catches copied form letters, then misses mail-merge, paraphrase, and own-words campaigns that still came from the same organised skeleton.

100% of organic pairs kept separate

903 organic-organic pairs, including same-theme near-misses

Communiti campaign detection 100%

Shared-topic shortcut 81.4%

Topic similarity is not proof of coordination. The shortcut wrongly grouped 168 organic-organic pairs because genuine residents can share concerns without sharing a template.

95.7% unique-voice count accuracy

45 voices predicted against 47 audited gold voices

Communiti campaign detection 95.7%

Exact duplicate finder 55.3%

The headline number stayed defensible: 45 predicted voices vs 47 gold on a corpus built to hide campaigns among genuine residents.

The duplicate finder collapses after exact copies

Pair recall by campaign evasiveness

Verbatim Copied form letter

100%

Mail-merge Personalised slots

100%

Paraphrase Same talking points, reworded

100%

Heavy rephrase Own-words campaign skeleton

66.7%

The publication gate for the hardest heavy-rephrase campaign was 60% recall. Communiti cleared it while preserving organic voices.

Why it matters

The public trust risk is counting campaigns incorrectly in either direction.

For community members

Shared concern does not mean fake voice

Residents who independently raise the same issue are kept separate instead of being collapsed into a campaign.

For ELT and elected leaders

The headline voice count is defensible

Decision-makers can see coordinated activity without overstating or understating genuine community participation.

For governance teams

Exceptions can be reviewed

Clusters, near-misses, and unique-voice counts remain tied to submissions and benchmark evidence.

At a glance

What changes when campaigns are measured directly

Comparison of common review shortcuts and Communiti campaign detection on the failure modes that matter for public consultation integrity.
Capability	Manual review Analyst + spreadsheet	Shortcuts Duplicates or shared topics	Communiti Intelligence Campaign detection
Exact form letters	Caught A reviewer or spreadsheet can spot exact copies.	Caught Exact duplicate detection works on copied text.	Caught Exact campaigns are clustered and counted once.
Mail-merge templates	Slow A reviewer has to notice the repeated structure across personalised submissions.	Missed Different streets and names break exact duplicate matching.	Clustered 100% pair recall on the mail-merge campaign.
Paraphrased talking points	Variable Reviewers may disagree about whether wording is coordinated or merely similar.	Missed Exact matching has no signal once the text changes.	Clustered 100% pair recall on the paraphrased campaign.
Organic near-misses	Possible A careful reader can protect genuine voices, but it is labour-intensive.	Merged Topic grouping treats shared concerns as coordination.	Protected 0 of 11 same-theme near-miss pairs were grouped.
Unique voice count	Manual The headline count depends on reviewer reconciliation.	Skewed Duplicate matching over-counted voices as 68; topic grouping under-counted them as 5.	Defensible 45 predicted voices against 47 audited gold voices.
Audit trail	By hand Clusters, notes, and exceptions have to be documented separately.	Thin A duplicate or topic bucket does not explain whether coordination was present.	Built in Campaign clusters can be reviewed against source submissions and the benchmark evidence pack.

For technical reviewers

The scores behind the release.

The full scorecard

Page-grade suite on a synthetic campaign-detection corpus: 79 submissions, 36 campaign instances, 43 organic responses, and 47 audited unique voices.

Metric	Communiti	Baseline or pass line
Same-campaign pair precision	98.7%	95.0% pass line
Same-campaign pair recall	96.8%	90.0% pass line
Same-campaign pair F1	97.7%	60.0% exact duplicate finder
Organic resident pairs wrongly grouped The shortcut grouped responses by proposal topic, which is not evidence of coordination	0 of 903	168 shared-topic shortcut
Same-theme near-miss pairs wrongly grouped	0 of 11	11 shared-topic shortcut
Unique-voice count Exact duplicate detection over-counted by treating reworded campaign variants as separate voices	45 predicted vs 47 gold	68 exact duplicate finder
Unique-voice count error	4.3%	44.7% exact duplicate finder
Borderline-pair reliability	93.4% on 61 pairs	90.0% pass line
Heavy rephrase campaign recall	66.7%	0% exact duplicate finder

Headline percentages are rounded for readability. The benchmark pack includes the synthetic corpus, gold labels, cached outputs, scoring code, charts, and dated evidence snapshot.

Methodology

How we measured

Test corpus

Synthetic consultation feedback only - no resident data - spanning 79 submissions, 36 campaign instances, 43 organic responses, 12 thematic near-misses, and 47 audited unique voices.

Campaign instances: 36; Four campaign styles: 12 verbatim copies, 10 mail-merge variants, 8 paraphrases, and 6 heavy own-words rephrases
Organic responses: 43; Independent submissions, including 12 thematic near-misses that raise the same issues as campaigns
Gold unique voices: 47; Each organic response counts once; each campaign counts once regardless of submission count

Processing controls documented

The benchmark pack records processing locations and deployment options so technical reviewers can verify the operating environment.

Never used to train AI

Your community's feedback is not used to train any AI model, and the model provider has no access to it - contractually guaranteed by AWS.

Evidence on request

The benchmark pack includes synthetic data, gold labels, scoring code, cached outputs, charts, and methodology notes for technical review.

Every cluster reviewable

Campaign labels are not loose summaries. Clusters can be checked against source submissions, near-miss examples, and the unique-voice count.

The fine print we think you should read

Test data. The benchmark uses synthetic consultation feedback written for testing. No resident data was used.
Pair scoring. Same-campaign precision and recall are scored over audited submission pairs. A pair is correct only when both submissions are instances of the same seeded campaign.
Workflow baselines. The exact duplicate finder and shared-topic shortcut are included because they reflect common ways teams approximate campaign review today. They are not product comparisons.
Unique voices. The gold unique-voice count treats each organic response as one voice and each seeded campaign as one coordinated voice.
Reproducibility. Every number on this page traces to a dated evidence snapshot, gold labels, cached outputs, scoring code, charts, and a rerunnable verification path.

Available now in Communiti

See campaign detection on your own submissions.

Bring one real, de-identified feedback export to a 30-minute walkthrough, or ask for the benchmark pack and have your technical team check the scoring path.

Book a demo Request the benchmark pack Explore the full benchmark

Same campaign skeleton

Same topic, different voice

Coordinated templates are found without silencing organic voices.

Exact duplicates miss reworded campaigns

Shared topics are not proof of coordination

Unique voice counts become reviewable

Benchmarked on campaign variants and genuine near-misses.

The hard part is separating the skeleton from the topic.

The hard part is separating the skeleton from the topic.

Same skeleton, personalised wording

Same topic, different voice

The benchmark scores both detection and protection.

96.8% of campaign relationships found

100% of organic pairs kept separate

95.7% unique-voice count accuracy

The duplicate finder collapses after exact copies

The public trust risk is counting campaigns incorrectly in either direction.

Shared concern does not mean fake voice

The headline voice count is defensible

Exceptions can be reviewed

What changes when campaigns are measured directly

The scores behind the release.

The full scorecard

How we measured

Processing controls documented

Never used to train AI

Evidence on request

Every cluster reviewable

The fine print we think you should read

See campaign detection on your own submissions.

Stay close to the future of community engagement