![]() |
Breidbart Index (BI) Definition
The Breidbart Index. Named for its inventor, Seth Breidbart. The BI is a measure of how spammy a spammed news article is. It is the sum of the square root of the number of groups each copy of a spam article is posted to. So if you post 10 copies of an article, each cross-posted to 4 groups, the BI is 20. Other ways of reaching the BI=20 mark (a threshhold used by some cancellers) is to post 20 copies, each to just one group, 4 copies to 25 groups each, or 8 articles to 6 groups each and one more to just one group. (for BI=20.6) BI weights multiposting heavier than crossposting for good reason. Crossposting is easily dealt with at the server and newsreader level. Most news servers can be configured to drop crossposts beyond a certain threshhold, and many of the better newsreaders can use killfiles (or other more sophisticated filtering) to automatically hide articles cross-posted to too many groups. Any decnt newsreader will also mark a crossposted article as 'read' in all the groups it is posted to after a single reading. Multiposting is different because there is no low-impact way to filter it out in transport or at the client level. Multiposting was the original sopurce of the term 'spam' for net-abuse, coined after a bozo lawyer posted 4000+ individual copies of an ad for immigration services, each to 2 groups. The result was reminiscent of the classic Monty Python 'spam' skit in which every item on a restaurant included the glorious pink meat-like stuff (sometimes many times over, like the "Spam, Spam, Spam, Spam, Spam, Spam, Baked Beans & Spam"); every group one tried to read included a copy of the 'Green Card' spam article. For those who wish to delve into the more "techie" aspects of the BI, here is the FAQ on the subject: Archive-name: usenet/spam-faqPosting-Frequency: weekly Last-modified: 1997/03/25 URL: http://www.uiuc.edu/ph/www/tskirvin/faqs/spam.html Maintainer: tskirvin@uiuc.edu (Tim Skirvin) Original-author: clewis@ferret.ocunix.on.ca (Chris Lewis) Current Spam thresholds and guidelines. This article is intended to describe the current consensus spam thresholds and ensure that the definitions of these terms are availible. It is believed that most, if not all, spam cancellers use these terms and definitions in their work; however, many other people use the terms inappropriately, which leands to confusion in discussions. This is an informal FAQ aimed at clarity and understanding, not anal-retentive correctness. Excessive Multi-Posting (EMP) has the same meaning as the term "spam" usually carries, but it is more accurate and self-explanatory. EMP means, essentially, "too many separate copies of a substantively identical article." "Substantively identical" means that the material in each article is sufficiently similar to construe the same message. The signature is included in the determination. These are examples of substantively identical articles:
Cross-posting means that a single message appears in more than one group. Most newsreaders allow you to specify more than one group in a posting. Excessive Crossposting (ECP), also known as "Velveeta", refers to where a "lot" of postings to more than one group each have been made. Some people think cross-posting is "bad". In and of itself, it's good behaviour - it allows you to reach more groups with less impact on the net. Especially if you set the followup-to: header to one group. It is "bad" when it's done to provoke flamewars (like cross-posting how to cook a cat between alt.tasteless and rec.pet.cats), but this is not the topic of this FAQ. This author considers the term "spam" to mean excessive postings of EMP and/or ECP variety. That is, "spam", is a generic term for several different things. The term was originally supposed to mean EMPs only, but most people use "spam" to mean "any excessive posting". The term "jello" means a large/combined EMP/ECP. This author doesn't believe this to be a useful term. Indeed, this author doesn't really believe any of these terms are useful - always call them "spam". A spam, EMP, or ECP then refers to a posting that has been posted to many places. There is a consensus that there is a point at which it is abuse, and is subject to advisory cancellation. A formula has been invented by Seth Breidbart which attempts to quantify the degree of "badness" of a spam (whether EMP or ECP) as a single number. The Breidbart Index (BI) is defined as the sum of the square roots of n (n is the number of newsgroups each copy was posted to). Example: If two copies of a posting are made, one to 9 groups, and one to 16, the BI index is sqrt(9)+sqrt(16) = 3+4 = 7. The BI2 (Breidbart Index, version 2) is an experimental metric, which may eventually replace the BI. It is calculated by computing the sum of the square roots of n, plus the sum of n, and dividing by two. Eg: one posting to 9, and one to 16 is The BI2 is more "aggressive" than the BI, intended to cut off the "higher end". BI allows about 125 newsgroups maximum. BI2 allows a maximum of 35. A slightly less aggressive index is the SBI (Skirvin-Breidbart Index); it is calculated much the same as the BI2, but sums the number of groups in the Followup-to: header (if available), rather than the newsgroups. Eg: one posting to 9 groups, and one to 16 with followups set to 4 is
Except in nl.*, the BI2 and SBI are not used to determine whether a spam is cancellable. The thresholds for spam cancels are based _only_ on one or more of the following measures:
A single posting cannot be cancellable - to reach a BI of 20, it would have to be cross-posted to 400 groups. This isn't possible due to limitations in Usenet software. These thresholds are applied to all hierarchies, not only the big8, but alt, bitnet, bionet, biz and regional hierarchies etc. Many hierarchies have more restrictive rules which are decided upon and enforced by their users and administrators. These cancels have nothing whatsoever to do with the contents of the message. It doesn't matter if it's an advertisement, it doesn't matter if it's abusive, it doesn't matter whether it's on-topic in the groups it was posted in, it doesn't matter whether the posting is for a "good cause" or not. Spam cancels are non-content based. They're not based on _what_ was said, they're based only on how many times it was said. Administrators wishing to ignore spam cancels can "alias out" the site "cyberspam", and the cancels will not affect your system. This is normally done at your feed site, but patches are available for INN to allow you to reject spam cancels on your own system. Ask in news.admin.net-abuse.usenet if you need this patch. Further literature on posting etiquette can be found in:
ftp://rtfm.mit.edu/pub/usenet-by-group/news.answers/usenet/what-is/part1
ftp://rtfm.mit.edu/pub/usenet-by-group/news.answers/usenet/what-is/part2
ftp://rtfm.mit.edu/pub/usenet-by-group/news.answers/usenet/advertising/how-to/part1
ftp://rtfm.mit.edu/pub/usenet-by-group/news.answers/usenet/primer/part1
ftp://rtfm.mit.edu/pub/usenet-by-group/news.answers/usenet/posting-rules/part1
ftp://rtfm.mit.edu/pub/usenet-by-group/news.answers/usenet/emily-postnews/part1
The above FAQs are also mirrored at various sites, including as ftp.sunet.se, mirror.aol.com, ftp.uu.net, ftp.uni-paderborn.de, nctuccca.edu.tw, hwarang.postech.ac.kr, ftp.hk.super.net etc. A mailing list has been set up to assist those wishing to post commercial advertisements on Usenet in a responsible fashion. Email your questions to commerce@acpub.duke.edu.All comments within these pages are expressed as personal opinions only. |