Robots.txt crawl errors?

posie

Free Member
Jun 9, 2010
32
0
Hi I recently changed provider for my website and site traffic has doubled however when looking at googles webmaster tools it says:

"there are severe health issues with your site - some important pages are being blocked in robots.txt"

Under crawl errors it says there are 3,306 pages restricted by robots.txt

I assumed this was just where it had duplicate listings for each page as I only have around 180 pages.

Is this message something I should just ignore? The robots.txt file hasn't been altered at all from when it was first set up.

I am using Magento, the file is as follows:

User-agent: *
Disallow: /index.php/
Disallow: /*?
Disallow: /*.js$
Disallow: /*.css$
Disallow: /checkout/
Disallow: /tag/
Disallow: /app/
Disallow: /downloader/
Disallow: /js/
Disallow: /lib/
Disallow: /*.php$
Disallow: /pkginfo/
Disallow: /report/
Disallow: /skin/
Disallow: /var/
Disallow: /catalog/
Disallow: /customer/
Disallow: /sendfriend/
Disallow: /review/
Disallow: /*SID=

Thanks for your help,
Sarah
 

posie

Free Member
Jun 9, 2010
32
0
As I mentioned I hadn't altered the file from its default when it was set up a month ago. Do I need to remove that part?,

All my pages have been indexed by google and the google website traffic has increased since the new site has been up, when you look at the pages it seems to be restricting it just seems to be the longer urls which have been replaced with the shorter text urls anyway.

Regards,
Sarah
 
Upvote 0

posie

Free Member
Jun 9, 2010
32
0
I had a look in the admin of my site on Magento and it is set to "INDEX, FOLLOW", and then below that is the text I put in the first message.

Looking at the 3000 pages that had problems crawling each on had for example "?SID=41ovhfbhg4o6jr9o5bjci70t5" so could someone confirm that it is right that all 3000 of pages with that ending aren't crawled.

The other thing I noticed is that there are 50 pages it says have duplicate meta descriptions, it seems to be the latest products I have added, it has got pages duplicated for instance listed as both brides/ivorybouquet and /ivory bouquet, when all the others are under just the one, I havn't changed any settings but it seems to be only the last few products i've put on, is there anything I can check in the settings of Magento?

Regards,
Sarah
 
Upvote 0

trev.pope

Free Member
Dec 5, 2011
29
4
East Sussex
Hi Sarah,

This line 'Disallow: /*SID=' in your robots.txt is blocking the pages that WERE indexed with "?SID=41ovhfbhg4o6jr9o5bjci70t5", Google is notifying you that these pages are no longer accessible via the Google bot.

I think you possibly had a 'canonical' issue with the previous website setup that you no longer have now as you do not want Google indexing pages with the ?SID parameter at the end.

What do you have as your 'Base URL' setting in System > Configuration?
 
Upvote 0

trev.pope

Free Member
Dec 5, 2011
29
4
East Sussex
Whatever you set as the 'Base URL' in Magento, you should also ensure is set as your preferred domain in Webmaster Tools. Whether it be:

yourdomain.co.uk
www.yourdomain.co.uk

The ?SID should only appear in a Magento URL if the website is accessed using the domain that does not match the 'Base URL' set in Magento.

So, I think your 'Base URL' changed from your old set up to your new one.

You should also set up a URL Redirect to your preferred domain.

Hope that helps.
 
Upvote 0

posie

Free Member
Jun 9, 2010
32
0
Hi,

I had a log in the configuration but couldn't find what you were refering to about the base url, im using Magento Go

I have it set up so that it sees my site as the www. version and have it set up so google sees it as www. too, in the admin panel there was the set up a 301 redirect up, which I have got selected, have I done all this right?

Do you mean that the ?SID sites (which there are 3000 of on crawl errors), shouldn't be there at all?, or that they are a thing which occurs naturally but google filters them out?

The only thing I can think of when you say that the base url may not be set the same is that when first setting up the site with Magento Go before you select your own domain it has for example a "flower.gostorego.com " domain?, Would this be relevent to any of the problems I have mentioned.

Regards,
Sarah
 
Upvote 0

trev.pope

Free Member
Dec 5, 2011
29
4
East Sussex
How long did you have your Magento Go site on the previous setup?

How long have you had your Webmaster Tools setup for this site?

Are you positive that these 'blocked pages' have only appeared since moving to the new setup?

Could you show me the full URL's of some of the 'blocked pages' please?
 
Upvote 0

posie

Free Member
Jun 9, 2010
32
0

I changed from ePages to Magento Go around January this year and have been using webmaster tools since I started my site 2 years ago, the 3000 pages were found on March 3 according to webmaster tools.

So do all the ?SID indicate there is a problem or is it ok to leave it?

The are around 3 different ones for each page and for each product, for example:

posie. co. uk/buttonholes.html?SID=4rf39tft19aca75tlkd5hmfv65
posie. co. uk/buttonholes.html?SID=3sf9ohuv27h1qgssja94ksldt2
posie. co. uk/buttonholes.html?SID=3upb47d47c0o283jjr4vmnf5e4

There also many of this type:

/catalog/product_compare/add/product/206/uenc/aHR0cHM6Ly9wb3NpZS5nb3N0b3JlZ28uY29tL3RpYXJhcy5odG1s/?SID=utmtg45ud6jt2b46nhd2pr8927


Thanks for your help.
Sarah
 
Upvote 0

trev.pope

Free Member
Dec 5, 2011
29
4
East Sussex
Upvote 0
R

RevaxMedia

Sarah, Open notepad and copy:

User-agent: *
Disallow: /*?
Disallow: /*.js$
Disallow: /*.css$
Disallow: /checkout/
Disallow: /tag/
Disallow: /app/
Disallow: /downloader/
Disallow: /js/
Disallow: /lib/
Disallow: /*.php$
Disallow: /pkginfo/
Disallow: /report/
Disallow: /skin/
Disallow: /var/
Disallow: /catalog/
Disallow: /customer/
Disallow: /sendfriend/
Disallow: /review/
Disallow: /*SID=


Then upload the file to your FTP where the robots.txt file is, delete your old one and rename this one to robots.txt
 
Last edited by a moderator:
Upvote 0

posie

Free Member
Jun 9, 2010
32
0
Thanks, I have now uploaded the file as you said.

Also I have fixed the problem trev.pope mentioned about the links refering to a different domain.

When looking in Magento Go, there is a "url management" section which has all the product pages in, for example:

ID path - /product/398 Request Path - /white-glitter-brides-bouquet Target path - catalog/product/view/id/398

Around half have 301 permenant redirects set up next to them and half don't, should they all have 301's? I.e. is this why google is finding so many duplicates as I mentioned earlier.

Regards,

Sarah
 
Upvote 0

posie

Free Member
Jun 9, 2010
32
0
Actually reading the help guide seems to indicate none of them should be 301 redirected, am I right? So should I removed the redirect on those that are?

"Redirect - Yes | No. If you select Yes, the URL will switch to the Target Path when the Request Path is entered in the address bar. If you select No, the URL will remain in the format of the Request Path."

Thanks for your help

Sarah
 
Upvote 0

trev.pope

Free Member
Dec 5, 2011
29
4
East Sussex
Hi Sarah,

Google isn't finding duplicates, you mentioned in your first post that the pages were listed under 'Restricted By Robots.txt' tab in Webmaster Tools. The reason Google is listing these is because it did have all of those pages indexed, the robots.txt that you have currently blocks them because it states that search engines should not index a URL with an ?SID in it - This is good and IS what you want.

Who set up the 301 redirects?
 
Upvote 0

posie

Free Member
Jun 9, 2010
32
0
Im not sure why some have redirects set up and others don't, the only thing i've done involving 301 redirects is tick the box for 301 on the base, for the www. non-www.

I logged in to webmaster tools today and it still came up with the message "there are severe health problems with your site" - "some important pages are being blocked". But in theory there shouldn't be anything wrong with the site now?

Regards,
Sarah
 
Upvote 0

posie

Free Member
Jun 9, 2010
32
0
I think i've found the difference between ones that have redirects and ones that don't, for instance:

ivory-diamante-rose-brides-bouquet-335.html target path - ivory-diamante-rose-brides-bouquet.html which it is redirected to.

So I think all it is is where it has taken the number part out. So all should be ok.

I've now read up about the 3000 ?SID= paths, so I understand what you mean about it creating them when the base url is mismatched. Could this have anything to do with the www. and non www? - I had many problems with this initially so thats why I thought it may be the cause.

Regards,
Sarah
 
Upvote 0

trev.pope

Free Member
Dec 5, 2011
29
4
East Sussex
Don't worry about webmaster tools stating severe health issues, that is because of the large amount of restricted indexed pages, the restricted pages are to be expected with the robots.txt blOcking the ?SIDs.

I think your 301 redirects are fine, I'll try and have a look at that product later but think you are all good now.

The severe health warning in webmaster tools will go in time, would expect within a week
 
  • Like
Reactions: posie
Upvote 0

posie

Free Member
Jun 9, 2010
32
0
Thanks for all your help, I'll keep an eye on the webmaster tools, I wanted to get the basics sorted out so I can try and work on improving my site. if anyone else notices anything wrong or could be improved please let me know.

Regards,
Sarah
 
Upvote 0

Latest Articles

Join UK Business Forums for free business advice