Who are behind WebDataCentreBot?
It does not pay not to preemptively block ranges known to be occupied by popular hosting companies, unless you want to have fun with non behaving or fake bots. The pleasure of me enjoying the WebDataCentreBot was rather accidental as I was lazy in terms of blocklisting any SoftLayer ranges, so that these may not be able to do anything but sending mail to or receiving mail from me.
Sitting on 67.228.177.87 and announcing itself as:
Mozilla/5.0 (compatible; WebDataCentreBot/1.0; +http://WebDataCentre.com/)
Not only did it jump right in to start indexing without bothering in the slightest about robots.txt, but also happily accepted content that was explicitly excluded from robots.txt. But then again, how should it know without reading it in the first place? Well, I thought perhaps they want to learn about the broken behaviour of their bot and fix it, but looking at their site webdatacentre.com, all I can find is:
Web Data Centre
Web Data Centre is an internet research project driven by a small team of researchers from different parts of the world. Its aim is to get a better understanding of the link structure of the web. More info is coming shortly.
(front page as of June 29th 2008)
And that was it. No point of contact whatsoever and looking at the registration data, things turn out to look pretty spammy:
Domain Name: WEBDATACENTRE.COM
Registrant [1435225]:
Moniker Privacy Services
20 SW 27th Ave.
Suite 201
Pompano Beach
FL
33069
US
Administrative Contact [1435225]:
Moniker Privacy Services WEBDATACENTRE.COM @ domainservice.com
Moniker Privacy Services
20 SW 27th Ave.
Suite 201
Pompano Beach
FL
33069
US
Phone: +1.9549848445
Fax: +1.9549699155
Billing Contact [1435225]:
Moniker Privacy Services WEBDATACENTRE.COM @ domainservice.com
Moniker Privacy Services
20 SW 27th Ave.
Suite 201
Pompano Beach
FL
33069
US
Phone: +1.9549848445
Fax: +1.9549699155
Technical Contact [1435225]:
Moniker Privacy Services WEBDATACENTRE.COM @ domainservice.com
Moniker Privacy Services
20 SW 27th Ave.
Suite 201
Pompano Beach
FL
33069
US
Phone: +1.9549848445
Fax: +1.9549699155
Domain servers in listed order:
NS1.DOMAINSERVICE.COM 67.99.176.12
NS2.DOMAINSERVICE.COM 67.97.247.209
NS3.DOMAINSERVICE.COM 64.49.213.231
NS4.DOMAINSERVICE.COM 67.97.247.210
Record created on: 2008-06-27 05:46:23.0
Database last updated on: 2008-06-27 05:46:39.373
Domain Expires on: 2009-06-27 05:46:41.0
Registered a mere two days ago and hiding behind an anonymous privacy shield. Why would a business want to remain anonymous unless it has to conceal something? One also might expect a search engine to reveal its legitimacy by having a meaningful rDNS name that reflects the bot’s name, but nothing much to find here either:
olliver@bunkiten:~$ host 67.228.177.87 87.177.228.67.in-addr.arpa domain name pointer midphase.com.
Midphase.com is the generic PTR record of a Softlayer reseller:
%rwhois V-1.5:003fff:00 rwhois.softlayer.com (by Network Solutions, Inc. V-1.5.9.5) network:Class-Name:network network:ID:NETBLK-SOFTLAYER.67.228.160.0/19 network:Auth-Area:67.228.160.0/19 network:Network-Name:SOFTLAYER-67.228.160.0 network:IP-Network:67.228.177.0/24 network:IP-Network-Block:67.228.177.0-67.228.177.255 network:Organization;I:Hosting Services Inc. network:Street-Address:223 West Jackson Blvd STE# 1014 network:City:Chicago network:State:IL network:Postal-Code:60606 network:Country-Code:US network:Tech-Contact;I:sysadmins @ softlayer.com network:Abuse-Contact;I:abuse @ midphase.com network:Admin-Contact;I:IPADM258-ARIN network:Created:20080128 network:Updated:20080324 network:Updated-By:ipadmin @ softlayer.com
An aggregated range of consecutive ip addresses registered to the bot building outfit would seem more practical, especially to direct complaints to the appropriate persons. However, there is no info about the number of ip addresses in use by this anonymous entity, which effectively helps Midphase’s publicity shy customers remain anonymous. Putting all together, it seems more likely to assume they are content/email/webform seeking spammers building a list for themselves or to sell to other spammers than an actual search engine. Even if I am all mistaken, I am still not particularly keen on bots that do ignore established standards like robots.txt. Absent any communication channels one has to conclude that one may not be able to opt out from their crawling by ordinary means.
Therefore, firewalling this particular range seems an appropriate solution to me:
iptables -A INPUT -s 67.228.177.0/24 -i eth0 -p tcp -m tcp ! --dport 25 --syn -j REJECT
This rule rejects all incoming TCP traffic except for SMTP, as there may be legit sites we like to receive mail from or sent mail to. We have to specify that only incoming syn packages be rejected, because otherwise outgoing mail to this address range would remain stuck in our queue and never got delivered. If this potential need for communication is not an issue to be worried of, one still can apply the BOfH method and drop the range altogether:
iptables -A INPUT -s 67.228.177.0/24 -i eth0 -j DROP
Apache servers may also be happy about another SetEnvIfRule, preferably in httpd.conf/apache2.conf or .htaccess if the former is not an option due to a shared hosting account:
SetEnvIfNoCase User-Agent "WebDataCentre(Bot|\.com)" block Deny from env=block
Update July 1st, 2008:
The bot has been spotted with another ip address, 66.150.224.245, this time without any rDNS record at all:
olliver@bunkiten:~$ host 66.150.224.245 Host 245.224.150.66.in-addr.arpa. not found: 3(NXDOMAIN)
Familiar set up, within a /24 of a presumable Internap reseller and still without any details concerning the company/project.
CustName: Networld Internet Services Address: P.O box 551 City: Skippack StateProv: PA PostalCode: 19474 Country: US RegDate: 2007-01-16 Updated: 2007-01-16 NetRange: 66.150.224.0 - 66.150.224.255 CIDR: 66.150.224.0/24 NetName: INAP-PHI-NETWORLDINT-12098 NetHandle: NET-66-150-224-0-1 Parent: NET-66-150-0-0-1 NetType: Reassigned Comment: RegDate: 2007-01-16 Updated: 2007-01-16 RTechHandle: INO3-ARIN RTechName: InterNap Network Operations Center RTechPhone: +1-877-843-4662 RTechEmail: noc @ internap.com OrgAbuseHandle: IAC3-ARIN OrgAbuseName: Internap Abuse Contact OrgAbusePhone: +1-206-256-9500 OrgAbuseEmail: abuse @ internap.com OrgTechHandle: INO3-ARIN OrgTechName: InterNap Network Operations Center OrgTechPhone: +1-877-843-4662 OrgTechEmail: noc @ internap.com
In case you want to add another iptables rule based on the sample further above, simply replace 67.228.177.0/24 with 66.150.224.0/24 and you should be set.
Update July 4th, 2008
Another sighting, this time crawling from Sweden using 77.110.52.67 as ip address:
olliver@bunkiten:~$ host 77.110.52.67 67.52.110.77.in-addr.arpa is an alias for 77-110-52-67.univation.riksnet.nu. 77-110-52-67.univation.riksnet.nu domain name pointer ip67.univation.riksnet.nu.
So the pattern of using generic rDNS records obviously persists, as does their ignorance concerning robots.txt.
Whois:
inetnum: 77.110.52.64 - 77.110.52.79 netname: SE-RIKSNET-UNIVATION2 descr: Stockholm Univation AB site2 country: SE admin-c: BEER3-RIPE tech-c: BEER3-RIPE status: ASSIGNED PA mnt-by: MNT-RIKSNET mnt-lower: MNT-RIKSNET mnt-routes: MNT-RIKSNET source: RIPE # Filtered person: Bengt Erik Sandstrom address: Graddvagen 7 address: S-906 20 Umea address: Sweden phone: +46 768 272022 nic-hdl: BEER3-RIPE source: RIPE # Filtered
That range would translate to 77.110.52.64/28, a rather small block this time, and this is also the value you would like to use for blocking them via iptables or other means.
Eight weeks a day without life
I’ve been waiting for a guide to come and take me by the hand
Could these sensations make me feel the pleasures of a normal man?
These sensations barely interest me for another day
I’ve got the spirit, lose the feeling, take the shock away
Ian Curtis - Disorder
Perhaps a summary of the past two months:
Some people may appear like helpful or in favour of you, but taken at their words they turn out to be just more professional poseurs with decades of practice in society compliant obedience. Should you ever happen to get too close to the edge of nowhere, they will quickly let you know about their priorities. In line of society means transposing the laws of capitalism to everyday’s life, which of course is nothing else but a political correct form of Darwinism. You are not welcome as a human, but as a human resource to deploy, as a commodity, institution, object or vessel for silly prejudices and hatred. Right is not a matter of the better argument, but merely a matter of dependence and abusing it for one’s own end.
There is not really a good reason to live on like nothing ever happened, because it has alway been around like this. Perhaps one was lucky to be spared, not to come into someone else’s crosshairs, but that alone does not make the world a better place. Nor does it mean that people will think of someone as a useful member of society. You are judged by what you own by people who do not have the authority to judge and not by what you achieved. There is a place, confirmed and assigned, but no matter how hard you try you do not get to change the rules others will apply to you. Should you ever become too careless to forget about it someone will gracefully remind you of it.
Giorgos Stefanou - Travelling in Space-Time

Giorgos Stefanou’s Travelling in Space-Time has been released on Petcord: Described as an imaginary journey to a future form of civilisation, the hope for success appears to be of a rather limited nature. What is the driving factor behind this journey? One may conclude it could be related to the religious notion of salvation, the eventual reward after a troubled life, however does not seem to fit to the scenery and its lack of euphoria. Instead there is solitude and isolation, thrown into a rather hostile environment with a lifeless machinery as the only communication offer. A journey which seems to meet its (lack of) expectations like a disillusioned look into the mirror with no one or any circumstances to blame. On the other hand even a pointless occupation serves as an option to keep oneself busy, at least until an alternative option will occur on the horizon.
The intensity of its nihilism Travelling in Space-Time seems to imply turns it into an electro-acoustic masterpiece. By deliberately avoiding significant culmination points and creating a cavernous sound similar to Martin Hannett’s production of Joy Division’s Unknown Pleasures this approach appears as an effective means to an kafkaesque end. Form follows function follows spectromorphology, but does not follow mainstream conversations.