PDA

View Full Version : Mad Spiders


DotNetWebs
5th January 2006, 22:18
Are there any experts on search engine spiders here? If so can you please have a look at this for me:

http://www.horshamforum.com/WhosOn1-0.aspx

For the last 36 hours or so this forum site I run has been crawled by:

inetnum: 217.212.224.128 - 217.212.224.255
netname: SE-PICSEARCH

This site tends to get crawled several times a day by MSN, Google, Yahoo etc. All of these have a burst of activity for 10 mins or so and then they are finished. The PICSEARCH one seems to have got stuck in an endless loop searching the site calendar. If it carries on like this it could search 100 years or so into the future! This could start to impact on my bandwidth. :x

Judging from a quick Google I am not the only one with these concerns:

http://www.sitepoint.com/forums/showthread.php?t=330512

Any thoughts, should I block the spider?

Regards

Dotty

Dread
6th January 2006, 00:36
Yeah i'd block it, lets face it, if its not a google/msn/yahoo spider then you shouldnt have any concerns about blocking it.

mattk
6th January 2006, 07:12
Do a search on Google for Psbot. There's tonnes of info about it.

Do you have a robot.txt file that stops your forums from being indexed?

DotNetWebs
6th January 2006, 07:31
Dread and mattk

Thanks for your replies. I think it is time to take action. I have just had a look at my sever log 100MB IN THE LAST 8 HOURS!! :shock: Almost all due to this spider. It's still trawling the calendar. I hadn't up till now created a robots.txt as the forum is new and I wish to get it well indexed. All the other spiders seem to behave themselves but this one is truly mad. :evil:

Time to go and sort that robots.txt file

Regards

Dotty

DotNetWebs
6th January 2006, 08:12
I have just stamped on the spider and squashed it :lol:

Your right mattk there is tonnes of info about it. It seem it has a reputation as a bandwidth hog. Funny because on their About Page (http://www.picsearch.com/menu.cgi?item=Psbot) they say: "Psbot should not put any noticeable strain on a webserver" :x

mattk
6th January 2006, 08:26
Quality - glad you got it sorted!

Richard Conyard
6th January 2006, 08:53
Grumble, ruddy spiders there is one particular site we do that spiders become the bane of our lives.

It's a mobile phones site with thousands of different products and offerings and it seems that all 3 of the major spiders all try and go through it at the same time. Mind you if you go to:
http://www.google.co.uk/search?hs=Z7S&hl=en&client=firefox-a&rls=org.mozilla%3Aen-GB%3Aofficial&q=cheap+mobile+phones&btnG=Search&meta=

We're number 3 in the free listings so we must be doing something right with their SEO ;-)

DotNetWebs
6th January 2006, 09:48
Grumble, ruddy spiders there is one particular site we do that spiders become the bane of our lives.

It's a mobile phones site with thousands of different products and offerings and it seems that all 3 of the major spiders all try and go through it at the same time. Mind you if you go to:
http://www.google.co.uk/search?hs=Z7S&hl=en&client=firefox-a&rls=org.mozilla%3Aen-GB%3Aofficial&q=cheap+mobile+phones&btnG=Search&meta=

We're number 3 in the free listings so we must be doing something right with their SEO ;-)

Who needs load testing when you have Psbot and friends to do it for you :wink:

DotNetWebs
9th January 2006, 09:17
Just a quick follow up to this:

THEY CAME BACK. :evil:

Despite the robots.txt AND the relevant meta tags (belt and braces approach) my server started getting overloaded when the spider returned and got locked into another calendar loop.

I had to resort to blocking their IP range on the server.

This morning Picsearch have e-mailed me acknowledging the problem and have removed my site from their index (fair play to them).