Screen Scraping - help please

Hi,

I read a while back about ryanair and screenscraping where a travel company for want of a better word hacks into the booking system and sells their flights.

I want to create a site that has similar functions to mysupermarket where I can compare the prices from all supermarkets. is it A) possible to screenscrape this and B) legal?

As you can see I have absolutely no real knowledge of the process but any help or pointers in the right direction would be appreciated

Thanks
Fred
 

fisicx

Moderator
Sep 12, 2006
46,865
8
15,479
Aldershot
www.aerin.co.uk
If you have no real knowledge of the process then this is not a project for you. The software to run such an operation could easily cost hundreds of thousands of pounds and your marketing budget is likely to be thousands of pounds per week. Of course it can be done but I'm sure that the supermarkets will take offence and make it very difficult for you to extract the data, in fact they could even feed you false data.
 
  • Like
Reactions: fred1222
Upvote 0

MichaelG

Free Member
Sep 1, 2005
461
16
Berkshire
Hi Fred,

This is not a hard thing to. If the information is on a public website, you can scrap it - especially on e-commerce websites or any website that loops through a result-set. To do this and keep it going, you will need your developer(s) to keep checking and making changes to the scrap code has things change on the website you are scraping or has they unlock your IP address etc.

It is legal? I am not a lawyer, so I can't help you here, but if you are only interested in getting product price, I don't think there should be any issue. Saying this, it might be classed as hacking if you are scraping information through some software. Best to speak to a lawyer.

I wrote an application once, to get a daily screenshot of the homepage of all popular news websites - I wanted to capture and build a database of history. I stopped it, because I was running out of hard disk space - it worked as a prove of concept.
 
  • Like
Reactions: fred1222
Upvote 0
Hi Fred,

This is not a hard thing to. If the information is on a public website, you can scrap it - especially on e-commerce websites or any website that loops through a result-set. To do this and keep it going, you will need your developer(s) to keep checking and making changes to the scrap code has things change on the website you are scraping or has they unlock your IP address etc.

It is legal? I am not a lawyer, so I can't help you here, but if you are only interested in getting product price, I don't think there should be any issue. Saying this, it might be classed as hacking if you are scraping information through some software. Best to speak to a lawyer.

I wrote an application once, to get a daily screenshot of the homepage of all popular news websites - I wanted to capture and build a database of history. I stopped it, because I was running out of hard disk space - it worked as a prove of concept.

Thanks for the replies and views.

So in theory is it posible to scrape all the data from say asda, tescos, ocado etc and put them into one site. Would this be raw data or could it be transferred into a template similar to what the supermarkets already have set up?

Thanks for your input
 
Upvote 0

MichaelG

Free Member
Sep 1, 2005
461
16
Berkshire
It all depends on what you data you want and what you want to do with the data.

You might consider doing a dynamic scrap - where users search for a product on your website, your script visits all 3rd party websites to search for the product and get the price matching the product name.

Any developer working on this project, is likely to charge you per hour for the work, because getting it right be each 3rd party website will not be easy.

You might try writing to the websites for a product feed.
 
Upvote 0

Astaroth

Free Member
Aug 24, 2005
3,985
278
London
The scraping process simply creates a dataset and it is up to you what you then do with the data/ how you present it to your visitors.

It is unlikely to be "illegal" however you do have to ensure that you are not creating so much traffic to the retailers site that it verges on a potential denial of service attack. That said you would need to read the terms and conditions of each site to check if there is any useage limitations and I guess a chance of copyright issues but as has already been said - check with your solicitor for formal legal advice.

http://www.mysupermarket.co.uk appears to be offering a similar service and so it clearly is doable (I dont know if the regional variations in prices instore are replicated onto the websites of supermarkets).

You are persumably going to be contacting the supermarkets to get affiliate links to their sites and therefore monitorise your venture, why not ask them for their permission then and also if there are webservices or XML feeds you can connect to to save scraping?

As has been pointed out, scraping works on the principle of finding identifyable points in the page to locate prices etc and therefore if a site you are scraping amend their design/ templates your scraping may now fail until the process is updated by your developer. Again in an ideal world you would want heads up from the website owners of planned future changes so that you dont wake up one morning to your site being down and you then have to urgently get your developers back in to amend their coding.
 
Last edited:
Upvote 0

Velrajan

Free Member
Aug 23, 2009
32
0
Apologizes for not mention anything abt me,Basically am a software developer working for a big finance firm in Bournemouth.I have been doing lot of freelance projects for my customers thro' the freelance website RentACoder.Basically am much interested to work on scraping projects.

Please have a look at the testimonials (from my signature) i have received for the work i have done for my customers.

You keep saying this, but won't answer any questions about it when people probe you on it

If you are going to work with people, you need to be upfront with who you are and what you are doing
 
Upvote 0

edmondscommerce

Free Member
Nov 11, 2008
3,653
628
UK
Yes screen scraping is possible, widely used and legal. What do you think Google does?

That said there are ways and ways of doing it and it is certainly possible to create a polite or an impolite scraper.

As for the cost, fisicx is way off the mark.

For a robust and reliable automated system with a nice web front end for the customers and a general site design etc I would quote in the region of £10k
 
  • Like
Reactions: fisicx
Upvote 0

mattk

Free Member
Dec 5, 2005
2,579
974
50
Swindon
Screen scraping is a very old fashioned technology. Not only would you need different rules for every site you wanted to scrap, but as soon as one of those sites change their look and feel you'd have to re-write your scraping rules.

Astaroth is right. By far the best way to acheive this is to get the supermarkets to supply you with data via XML or a Web service. This makes it very easy for you and is relatively straight forward for them.
 
Upvote 0

googol

Free Member
May 11, 2009
77
7
I know Tesco have a SOAP API (currently in CTP, a beta is coming soon) and are aiming to have a REST API soon as well.
This allows you to grab data from their grocery service with XML feedback, and they have some sample code in C# (I converted it to an ASP.NET app from a client app pretty easily).
However at the moment you're not allowed to use the API for price comparison, however Nick Lansley (Head of R&D at Tesco.com) mentioned that they were likely to remove this subscription once the beta API is available on their new grocery platform (code-named project Martini).
I believe this is due to limitations in the current API and grocery platform where the prices aren't live prices from the mainframe for a particular store.

Information on the API can be found at https://www.lansleytech.com/tescoapiweb/reference.htm and http://www.techfortesco.com/forum/
There is an affiliate scheme open to all (£5 per new Tesco.com customer) and one only available to people who attended the launch event at the Microsoft building in London (10p per basket submitted through the API) which could be opened up should the initial £10,000 trial be successful.
 
Upvote 0
The vast majority of travel providers do not use screen scraping any more and haven't done for many years. One of the reasons was that each time a session logged on it would take a user login, and it became expensive to provide enough ports to satisfy the demand. It also went out of date too quickly. They all use XML now, and most content tends to come via a few data aggregators who can also provide boking services. Ryanair have a policy that they want all bookings done via their website only, so do not provide their data and they get very - very - picky if anyone tries to scrape it.
 
Upvote 0

edmondscommerce

Free Member
Nov 11, 2008
3,653
628
UK
in my core field - there are a lot of ecommerce businesses whose suppliers might be partly ecommerce savvy (they have a website) but are not prepared or even informed enough to consider supplying their products as a feed.

in these circumstances it is perfectly reasonable to set up a scraper to grab the content.

chances are the site will not change very regularly at all in any structural sense so the usual problems with scrapers is not an issue.

The only other option in this kind of scenario generally involves a lot of copying and pasting, messing about with PDF pricelists and generally struggling to get the required data.

That said there is a new approach for product data coming via an aggregated XML source. The best example is the ICEcat project.

This is a combined open source or paid system that provides really good quality information on a huge range of products. So far though there is a definite technology products bias so it is only useful for certain businesses.

Eventually though yes this kind of thing will become the norm, but I don't imagine that will come any time too soon. In the meantime screen scraping will continue to serve a purpose and provide a competitive advantage to those retailers who are savvy to the technique.
 
  • Like
Reactions: Rienne
Upvote 0

techsmart

Free Member
Nov 3, 2009
2
0
All replies above are on the right track. Screen scraping is legal and can be done to extract data from web. You can do it through programming or through any software or tool. I do web data extraction using a software only because that is pretty good tool to do that. I use Automation Anywhere. It's nice.

You can try it by downloading the trial by google-ing the name "Screen Scraping by Automation Anywhere".

Probably this will help you.

Enjoy!
 
Upvote 0

Latest Articles