|
|
Article Title: Create A .htaccess File Without Referral Spam
Author: Danny Wirken
Word Count: 1215
Article URL: http://www.isnare.com/?aid=91977&ca=Internet
Format: 64cpl
Author's Email Address: wirken[at]gmail.com (replace [at] with
@)
Easy Publish Tool: http://www.isnare.com/html.php?aid=91977
At present, there is a growing nuisance for users and
administrators alike of sites that ruin web servers and more
particularly, blogs. This nuisance is being referred to as
comment, trackback and referrer spams. Various solutions have
been proposed with some being applicable to even two of these
forms of spam using a single solution.
What is Referral Spam?
A referrer request-header file allows the client to specify the
address (URI) of the resource from which the request-URI was
obtained. It is a way for an HTTP client to send in the
headers, the URI of the page that sent them there. This is
especially handy for a site administrator to provide insight as
to where the traffic on his web server is coming from. It is
also depended upon by the most popular web server log analyzers
in providing statistics on the most common referrers.
The HTTP Referrer: header is very useful but it is also
completely arbitrary. Any web browser or HTTP client is free to
send a forged Referrer: header with any request to a web server.
Spammers have taken advantage of the fact that there is no
provision for authentication in SMPTP and have used the
existing openness to specially craft request with their website
in the Referrer: header.
Most people will find it difficult to understand why someone
would bother spamming something which only the site
administrator will see in the logs. One probable motivation
pinpointed is the boosting of search engine ranking. Another is
simply to show-up in any stats published by the site. If a site
being spammed runs a web server log analyzing software, access
to the URL in the top referrer's section is handily obtained by
the spammer.
A serious consequence of referrer spam is that the process is
often performed via an HTTP "GET" or "POST" request which
retrieves the entire body of the document being spammed. A 30k
document, for example, will have all the 30k transferred across
one's Internet pipe. This results to not a small amount of
traffic in the web server which could be very costly since
bandwidth is not cheap.
Referrer spam wastes CPU and disk space and can be a source of
endless annoyance to server operators. It is being actually
fought by search engine developers thus its initial
effectiveness in boosting a site's ranking has been
considerably lessened. However, the problem persists and much
has to be done to conquer it.
Some recommended practices in countering the threat of referral
spam include the non-publication of referrers by bloggers,
inclusion of the page in robots.txt when referrers have to be
published, use of the rel="no follow" attribute and gathering a
cleaner list of referrers using JavaScript and beacon images.
Some bloggers have begun fighting referrer spammers at the
.htaccess level. Others have even taken steps to automate this.
Blocking Users by Referrer Notes
A very useful feature of .htaccess is the ability to block
users or sites that originate from a particular domain. When
there are tons of referrals from a particular site with no
single visible link to one's own site from the said site, the
referral probably isn't a legitimate one. The other site is
most likely hot linking to certain files such as images, CSS
file or other file. The blocking access by referrer in
.htaccess requires the help of the Apache module mod rewrite to
be able to make out the referrer first. There is a fear that
spam would still come in even as .htaccess continue to grow.
Blacklisting certain referrers in .htaccess is another option,
the effectiveness of which has been greatly diminished due to
the ease by which spammers are able to register thousands of
domains and rotate them as quickly as they are blacklisted.
The .htaccess generator to prevent people from certain IP
addresses, domains or even countries from gaining access to a
site or to specific folders can be used. The full IP address
has to be typed to block a specific IP. The use of a partial IP
address is required to block a range of IPs. Blocking a
particular domain can be done by typing the domain without the
www. The tail extension is to be typed when blocking a country.
There is no limit to the entries that can be added one at a
time. The "add" should be checked after each entry while the
generated code is to be copied and posted into a plain text
file. This file is then named .htaccess. The "." Before the
file name should be noted as well as the absence of any tail
extension.
If there is already an .htaccess file in the root of the docs
directory or the folder where it is to be applied, the
generated code shall be added to the end of the current
.htaccess file, taking extra care not to disturb the existing
code. It will then be uploaded in ASCII mode.
The rel = "no follow" solution
A coalition of blogging and search engine companies have joined
together to support an HTML attribute designed primarily to
combat comment spam but have high potentials as well for
effective use against referral spam. This attribute is known as
the rel ="no follow" is being praised by many bloggers as the
ultimate solution for the prevailing problem. The idea is
simple enough with the hardest part being the matter of
convincing the major players such as Google, Yahoo! and MSN to
agree on it.
Tagging a link with rel ='no follow" attribute would prevent
any contribution to the site's PageRank. This means that
comment and referral spammers will not be rewarded for their
illegitimate activities on websites that implement the
attribute. The problem gets solved partially but this solution
is unable to end it.
This truth is sought to be explained by the fact that it is
impossible to reach a 100% adoption thus there will always be
an incentive to spam. Spammers essentially do not care whether
their techniques are specifically effective as long as they are
generally effective. They need no particular reason to hit any
site and will do so as their main target is the blogosphere as
a whole. It is also quite unfortunate that the resources
required to fight spam, particularly referral spam, is far
bigger than the resources needed to create it.
Referral spam is an HTTP request. The client doesn't even need
to acknowledge the response. All it may need is a simple packet
with formatted text.
Spammers take pains to make a request look legitimate. The user
- agent string would look very much like MSIE. It used to be
that spam came from a single IP but things have definitely
gotten more complex since then.
Filtering referrer IPs against spam blacklisting can also be
done. Listing the referring URL in any section of a site's web
stats should be avoided if the IP is blacklisted. Do not pursue
query once a given site is identified as a referral spam host
name.
About The Author: http://www.theinternetone.net
Please use the HTML version of this article at:
http://www.isnare.com/html.php?aid=91977
For more free-reprint articles by Danny Wirken please visit:
http://www.isnare.com/?s=author&a=Danny+Wirken