electro acoustic expressionism
nodepet
June 29th, 2008

Who are behind WebDataCentreBot?

Filed under: Web — olliver @ 23:52 h

It does not pay not to preemptively block ranges known to be occupied by popular hosting companies, unless you want to have fun with non behaving or fake bots. The pleasure of me enjoying the WebDataCentreBot was rather accidental as I was lazy in terms of blocklisting any SoftLayer ranges, so that these may not be able to do anything but sending mail to or receiving mail from me.

Sitting on 67.228.177.87 and announcing itself as:

Mozilla/5.0 (compatible; WebDataCentreBot/1.0; +http://WebDataCentre.com/)

Not only did it jump right in to start indexing without bothering in the slightest about robots.txt, but also happily accepted content that was explicitly excluded from robots.txt. But then again, how should it know without reading it in the first place? Well, I thought perhaps they want to learn about the broken behaviour of their bot and fix it, but looking at their site webdatacentre.com, all I can find is:

Web Data Centre

Web Data Centre is an internet research project driven by a small team of researchers from different parts of the world. Its aim is to get a better understanding of the link structure of the web. More info is coming shortly.

(front page as of June 29th 2008)

And that was it. No point of contact whatsoever and looking at the registration data, things turn out to look pretty spammy:

Domain Name: WEBDATACENTRE.COM

Registrant [1435225]:
        Moniker Privacy Services
        20 SW 27th Ave.
        Suite 201
        Pompano Beach
        FL
        33069
        US

Administrative Contact [1435225]:
        Moniker Privacy Services WEBDATACENTRE.COM @ domainservice.com
        Moniker Privacy Services
        20 SW 27th Ave.
        Suite 201
        Pompano Beach
        FL
        33069
        US
        Phone: +1.9549848445
        Fax:   +1.9549699155

Billing Contact [1435225]:
        Moniker Privacy Services WEBDATACENTRE.COM @ domainservice.com
        Moniker Privacy Services
        20 SW 27th Ave.
        Suite 201
        Pompano Beach
        FL
        33069
        US
        Phone: +1.9549848445
        Fax:   +1.9549699155

Technical Contact [1435225]:
        Moniker Privacy Services WEBDATACENTRE.COM @ domainservice.com
        Moniker Privacy Services
        20 SW 27th Ave.
        Suite 201
        Pompano Beach
        FL
        33069
        US
        Phone: +1.9549848445
        Fax:   +1.9549699155

Domain servers in listed order:

        NS1.DOMAINSERVICE.COM         67.99.176.12
        NS2.DOMAINSERVICE.COM         67.97.247.209
        NS3.DOMAINSERVICE.COM         64.49.213.231
        NS4.DOMAINSERVICE.COM         67.97.247.210

        Record created on:        2008-06-27 05:46:23.0
        Database last updated on: 2008-06-27 05:46:39.373
        Domain Expires on:        2009-06-27 05:46:41.0

Registered a mere two days ago and hiding behind an anonymous privacy shield. Why would a business want to remain anonymous unless it has to conceal something? One also might expect a search engine to reveal its legitimacy by having a meaningful rDNS name that reflects the bot’s name, but nothing much to find here either:

olliver@bunkiten:~$ host 67.228.177.87
87.177.228.67.in-addr.arpa domain name pointer midphase.com.

Midphase.com is the generic PTR record of a Softlayer reseller:

%rwhois V-1.5:003fff:00 rwhois.softlayer.com (by Network Solutions, Inc. V-1.5.9.5)
network:Class-Name:network
network:ID:NETBLK-SOFTLAYER.67.228.160.0/19
network:Auth-Area:67.228.160.0/19
network:Network-Name:SOFTLAYER-67.228.160.0
network:IP-Network:67.228.177.0/24
network:IP-Network-Block:67.228.177.0-67.228.177.255
network:Organization;I:Hosting Services Inc.
network:Street-Address:223 West Jackson Blvd STE# 1014
network:City:Chicago
network:State:IL
network:Postal-Code:60606
network:Country-Code:US
network:Tech-Contact;I:sysadmins @ softlayer.com
network:Abuse-Contact;I:abuse @ midphase.com
network:Admin-Contact;I:IPADM258-ARIN
network:Created:20080128
network:Updated:20080324
network:Updated-By:ipadmin @ softlayer.com

An aggregated range of consecutive ip addresses registered to the bot building outfit would seem more practical, especially to direct complaints to the appropriate persons. However, there is no info about the number of ip addresses in use by this anonymous entity, which effectively helps Midphase’s publicity shy customers remain anonymous. Putting all together, it seems more likely to assume they are content/email/webform seeking spammers building a list for themselves or to sell to other spammers than an actual search engine. Even if I am all mistaken, I am still not particularly keen on bots that do ignore established standards like robots.txt. Absent any communication channels one has to conclude that one may not be able to opt out from their crawling by ordinary means.

Therefore, firewalling this particular range seems an appropriate solution to me:

iptables -A INPUT -s 67.228.177.0/24 -i eth0 -p tcp -m tcp ! --dport 25 --syn -j REJECT

This rule rejects all incoming TCP traffic except for SMTP, as there may be legit sites we like to receive mail from or sent mail to. We have to specify that only incoming syn packages be rejected, because otherwise outgoing mail to this address range would remain stuck in our queue and never got delivered. If this potential need for communication is not an issue to be worried of, one still can apply the BOfH method and drop the range altogether:

iptables -A INPUT -s 67.228.177.0/24 -i eth0 -j DROP

Apache servers may also be happy about another SetEnvIfRule, preferably in httpd.conf/apache2.conf or .htaccess if the former is not an option due to a shared hosting account:

SetEnvIfNoCase User-Agent "WebDataCentre(Bot|\.com)" block

Deny from env=block

Update July 1st, 2008:

The bot has been spotted with another ip address, 66.150.224.245, this time without any rDNS record at all:

olliver@bunkiten:~$ host 66.150.224.245
Host 245.224.150.66.in-addr.arpa. not found: 3(NXDOMAIN)

Familiar set up, within a /24 of a presumable Internap reseller and still without any details concerning the company/project.

CustName:   Networld Internet Services
Address:    P.O box 551
City:       Skippack
StateProv:  PA
PostalCode: 19474
Country:    US
RegDate:    2007-01-16
Updated:    2007-01-16

NetRange:   66.150.224.0 - 66.150.224.255
CIDR:       66.150.224.0/24
NetName:    INAP-PHI-NETWORLDINT-12098
NetHandle:  NET-66-150-224-0-1
Parent:     NET-66-150-0-0-1
NetType:    Reassigned
Comment:
RegDate:    2007-01-16
Updated:    2007-01-16

RTechHandle: INO3-ARIN
RTechName:   InterNap Network Operations Center
RTechPhone:  +1-877-843-4662
RTechEmail:  noc @ internap.com 

OrgAbuseHandle: IAC3-ARIN
OrgAbuseName:   Internap Abuse Contact
OrgAbusePhone:  +1-206-256-9500
OrgAbuseEmail:  abuse @ internap.com

OrgTechHandle: INO3-ARIN
OrgTechName:   InterNap Network Operations Center
OrgTechPhone:  +1-877-843-4662
OrgTechEmail:  noc @ internap.com

In case you want to add another iptables rule based on the sample further above, simply replace 67.228.177.0/24 with 66.150.224.0/24 and you should be set.

Update July 4th, 2008

Another sighting, this time crawling from Sweden using 77.110.52.67 as ip address:

olliver@bunkiten:~$ host 77.110.52.67
67.52.110.77.in-addr.arpa is an alias for 77-110-52-67.univation.riksnet.nu.
77-110-52-67.univation.riksnet.nu domain name pointer ip67.univation.riksnet.nu.

So the pattern of using generic rDNS records obviously persists, as does their ignorance concerning robots.txt.

Whois:

inetnum:        77.110.52.64 - 77.110.52.79
netname:        SE-RIKSNET-UNIVATION2
descr:	        Stockholm Univation AB site2
country:        SE
admin-c:        BEER3-RIPE
tech-c:         BEER3-RIPE
status:         ASSIGNED PA
mnt-by:         MNT-RIKSNET
mnt-lower:      MNT-RIKSNET
mnt-routes:     MNT-RIKSNET
source:         RIPE # Filtered

person:         Bengt Erik Sandstrom
address:        Graddvagen 7
address:        S-906 20 Umea
address:        Sweden
phone:          +46 768 272022
nic-hdl:        BEER3-RIPE
source:         RIPE # Filtered

That range would translate to 77.110.52.64/28, a rather small block this time, and this is also the value you would like to use for blocking them via iptables or other means.

Comments (7)

June 23rd, 2008

Eight weeks a day without life

Filed under: Life — olliver @ 23:29 h

I’ve been waiting for a guide to come and take me by the hand
Could these sensations make me feel the pleasures of a normal man?
These sensations barely interest me for another day
I’ve got the spirit, lose the feeling, take the shock away

Ian Curtis - Disorder

Perhaps a summary of the past two months:
Some people may appear like helpful or in favour of you, but taken at their words they turn out to be just more professional poseurs with decades of practice in society compliant obedience. Should you ever happen to get too close to the edge of nowhere, they will quickly let you know about their priorities. In line of society means transposing the laws of capitalism to everyday’s life, which of course is nothing else but a political correct form of Darwinism. You are not welcome as a human, but as a human resource to deploy, as a commodity, institution, object or vessel for silly prejudices and hatred. Right is not a matter of the better argument, but merely a matter of dependence and abusing it for one’s own end.

There is not really a good reason to live on like nothing ever happened, because it has alway been around like this. Perhaps one was lucky to be spared, not to come into someone else’s crosshairs, but that alone does not make the world a better place. Nor does it mean that people will think of someone as a useful member of society. You are judged by what you own by people who do not have the authority to judge and not by what you achieved. There is a place, confirmed and assigned, but no matter how hard you try you do not get to change the rules others will apply to you. Should you ever become too careless to forget about it someone will gracefully remind you of it.

Comments (0)

Giorgos Stefanou - Travelling in Space-Time

Filed under: Music — olliver @ 15:09 h

Giorgos Stefanou - Travelling in Space-Time front coverGiorgos Stefanou - Travelling in Space-Time back coverGiorgos Stefanou’s Travelling in Space-Time has been released on Petcord: Described as an imaginary journey to a future form of civilisation, the hope for success appears to be of a rather limited nature. What is the driving factor behind this journey? One may conclude it could be related to the religious notion of salvation, the eventual reward after a troubled life, however does not seem to fit to the scenery and its lack of euphoria. Instead there is solitude and isolation, thrown into a rather hostile environment with a lifeless machinery as the only communication offer. A journey which seems to meet its (lack of) expectations like a disillusioned look into the mirror with no one or any circumstances to blame. On the other hand even a pointless occupation serves as an option to keep oneself busy, at least until an alternative option will occur on the horizon.

The intensity of its nihilism Travelling in Space-Time seems to imply turns it into an electro-acoustic masterpiece. By deliberately avoiding significant culmination points and creating a cavernous sound similar to Martin Hannett’s production of Joy Division’s Unknown Pleasures this approach appears as an effective means to an kafkaesque end. Form follows function follows spectromorphology, but does not follow mainstream conversations.

Comments (0)