electro acoustic expressionism
nodepet
September 30th, 2008

dotbot – yet another useless robot…

Filed under: Web — olliver @ 10:36 h

Allow me to start with a question: What is the purpose of a legitimate robot? One would think it is fetching content at a reasonable pace whilst respecting the host’s restrictions in robots.txt. When a bot bothers to fetch robots.txt prior to its crawling, does that signify it will also process its rules? Not necessarily it seems. When Dotbot visited me two days ago, it did not seem to be interested in my content, but in collecting redirect messages without following them:

208.115.111.245 – - [28/Sep/2008:08:53:50 +0200] “GET /robots.txt HTTP/1.1″ 200 77 “-” “Mozilla/5.0 (compatible; DotBot/1.1; http://www.dotnetdotcom.org/, crawler@dotnetdotcom.org)”
208.115.111.245 – - [28/Sep/2008:08:58:00 +0200] “GET /category/life HTTP/1.1″ 301 – “-” “Mozilla/5.0 (compatible; DotBot/1.1; http://www.dotnetdotcom.org/, crawler@dotnetdotcom.org)”
208.115.111.245 – - [28/Sep/2008:08:58:04 +0200] “GET /category/music HTTP/1.1″ 301 – “-” “Mozilla/5.0 (compatible; DotBot/1.1; http://www.dotnetdotcom.org/, crawler@dotnetdotcom.org)”
208.115.111.245 – - [28/Sep/2008:08:58:08 +0200] “GET /category/photo HTTP/1.1″ 301 – “-” “Mozilla/5.0 (compatible; DotBot/1.1; http://www.dotnetdotcom.org/, crawler@dotnetdotcom.org)”
208.115.111.245 – - [28/Sep/2008:08:58:13 +0200] “GET /category/spam HTTP/1.1″ 301 – “-” “Mozilla/5.0 (compatible; DotBot/1.1; http://www.dotnetdotcom.org/, crawler@dotnetdotcom.org)”
208.115.111.245 – - [28/Sep/2008:08:58:18 +0200] “GET /category/web HTTP/1.1″ 301 – “-” “Mozilla/5.0 (compatible; DotBot/1.1; http://www.dotnetdotcom.org/, crawler@dotnetdotcom.org)”

This is just a small but representative sample: For reasons unknown to me the Dotbot omits the terminal slash of the URI which results in a 301 redirect (because there is no file of that name). Now if only the spider followed it, so that it could fetch something meaningful. To cut a long story short, except for robots.txt, there was not a single article this bot took home, because the robot obviously does not know how to handle redirects. Quite a silly waste of resources in my opinion, but then again, what do I know about the bot’s purpose?

On the DotNetDotCom website, the crawler’s presumable home, we can find the following statement:

Hi! Thanks for letting us crawl you!

We are just a few Seattle based guys trying to figure out how to make internet data as open as possible. You should be able to find everything you are looking for below. If not feel free to contact us. Happy Surfing!

The “we are just …” statement does not raise much confidence in me. This impression is amplified by the next paragraph, which contains an instruction about how to get rid of the bot:

1. First and foremost, curse our name. Trust us, it will feel good. Now breath gently…
2. Create a simple text file named robots.txt and place it in your server’s root directory. (http://www.yoursite.com/ «– Right There!)
3. Add the following code to your robots.txt file:
User-agent: dotbot
Disallow: /
4. Reflect on how easy that was.

To me this does not sound like a responsible operation, because it suggests that rather than fixing their bot, they urge “flamers” to opt-out from their crawling. Regulars will know I am one of these flamers ;-) and of course this is not the only reason for my scepticism:

208.115.111.245 – - [28/Sep/2008:11:13:52 +0200] “GET /robots.txt HTTP/1.1″ 200 77 “-” “Mozilla/5.0 (compatible; DotBot/1.1; http://www.dotnetdotcom.org/, crawler@dotnetdotcom.org)”
208.115.111.245 – - [28/Sep/2008:11:19:32 +0200] “GET /impressum HTTP/1.1″ 301 241 “-” “Mozilla/5.0 (compatible; DotBot/1.1; http://www.dotnetdotcom.org/, crawler@dotnetdotcom.org)”

Impressum is explicitly excluded from crawling in robots.txt because it contains sensitive information about me that I am required to put up by German law. Yet, despite reading robots.txt DotBot chose to jump right onto it. Fortunately again failing to add a trailing slash to its request and handle the resulting 301 redirect properly. This is usually a KO criterion for a bot and since experience has proven time and again that bad bots have a tendency of morphing I prefer to firewall them right away.

Whois opines the following about their address space:

OrgName:    dotnetdotcom.org
OrgID:      DOTNE
Address:    93 S. Jackson Street #10070
City:       Seattle
StateProv:  WA
PostalCode: 98104-2818
Country:    US

NetRange:   208.115.111.240 - 208.115.111.255
CIDR:       208.115.111.240/28
OriginAS:   AS23033
NetName:    208-115-111-240-SLASH28
NetHandle:  NET-208-115-111-240-1
Parent:     NET-208-115-96-0-1
NetType:    Reassigned
Comment:
RegDate:    2008-07-21
Updated:    2008-07-21

I am not suggesting the DotNetDotCom owners are blackhats. But I have better things to do in my life then to debug other people’s bot operation. If DotBot even fails at elementary things like following robots.txt and redirects then I do not see to allow it to visit my sites. Blocking 208.115.111.240/28 should take care of the problem.

Comments (0)

September 29th, 2008

New Nodepet release: Frontal Grid

Filed under: Music — olliver @ 23:46 h

Nodepet - Frontal Grid front cover Here comes a new release by me on Petcord called Frontal Grid. Well, I wrote every now and then about my progress and how the finished release is supposed to sound like. For a change, I quite met my criteria and improved dynamics and the degree of abstraction. As a side effect, the music is more consistent concerning how it treats themes and spins their development further. But I unlike my previous plans, Frontal Grid again comprises of four movements which are more related to each other than those of Decay.

What will follow after Frontal Grid? More work, of course :-). There are some ideas I like to investigate more thoroughly, like ways of incorporating “natural instruments” into the computer generated mess I produce. I guess, this can only work like treating the source like any other. Not playing some voices by score, but instead reconstruct passages from unrelated snippets. Perhaps deliberately creating anomalies that could not be reproduced with a live player. The first movement of Frontal Grid is such an example, where I recycled several piano snippets.

We shall see…

Comments (0)

September 2nd, 2008

Systrum is dead – here comes Leftob audio cast

Filed under: Music — olliver @ 23:48 h

Leftob audio cast bannerThe Petcord Netlabel team felt that there is a need for a platform that introduces the work and research of experimental netlabel artists to an audience that is not necessarily familiar with the netlabel scene. Perhaps even thinking that this kind of music can only be purchased in shops or illegally downloaded at shady locations of the Internet. So here it comes, Ladies and Gentlemen, boys and girls, hippies and squares:

The Leftob Audio Cast with a 160kbit/s stream and room for 150 visitors.

In IDM and rhythm orientated music the Petcord team is not really interested and therefore specialises on beatless ambient sounds that reach out to electroacoustic, dark ambient and even noisy sound territories. But there is even more: The playlist not only shows the last 20 tunes, but also generates links to the original release page and – if available – to the artist him-/herself. This way, any interested listener just needs to go to the Leftob page to find the original release page for downloading. So, as a summary, this is a project both musicians and labels can benefit from, which is principally a good thing [tm].

Comments (0)