PDA

View Full Version : Screen Scraping - help please


fred1222
3rd May 2009, 19:30
Hi,

I read a while back about ryanair and screenscraping where a travel company for want of a better word hacks into the booking system and sells their flights.

I want to create a site that has similar functions to mysupermarket where I can compare the prices from all supermarkets. is it A) possible to screenscrape this and B) legal?

As you can see I have absolutely no real knowledge of the process but any help or pointers in the right direction would be appreciated

Thanks
Fred

fisicx
3rd May 2009, 20:13
If you have no real knowledge of the process then this is not a project for you. The software to run such an operation could easily cost hundreds of thousands of pounds and your marketing budget is likely to be thousands of pounds per week. Of course it can be done but I'm sure that the supermarkets will take offence and make it very difficult for you to extract the data, in fact they could even feed you false data.

MichaelG
4th May 2009, 09:01
Hi Fred,

This is not a hard thing to. If the information is on a public website, you can scrap it - especially on e-commerce websites or any website that loops through a result-set. To do this and keep it going, you will need your developer(s) to keep checking and making changes to the scrap code has things change on the website you are scraping or has they unlock your IP address etc.

It is legal? I am not a lawyer, so I can't help you here, but if you are only interested in getting product price, I don't think there should be any issue. Saying this, it might be classed as hacking if you are scraping information through some software. Best to speak to a lawyer.

I wrote an application once, to get a daily screenshot of the homepage of all popular news websites - I wanted to capture and build a database of history. I stopped it, because I was running out of hard disk space - it worked as a prove of concept.

fred1222
4th May 2009, 09:37
Hi Fred,

This is not a hard thing to. If the information is on a public website, you can scrap it - especially on e-commerce websites or any website that loops through a result-set. To do this and keep it going, you will need your developer(s) to keep checking and making changes to the scrap code has things change on the website you are scraping or has they unlock your IP address etc.

It is legal? I am not a lawyer, so I can't help you here, but if you are only interested in getting product price, I don't think there should be any issue. Saying this, it might be classed as hacking if you are scraping information through some software. Best to speak to a lawyer.

I wrote an application once, to get a daily screenshot of the homepage of all popular news websites - I wanted to capture and build a database of history. I stopped it, because I was running out of hard disk space - it worked as a prove of concept.

Thanks for the replies and views.

So in theory is it posible to scrape all the data from say asda, tescos, ocado etc and put them into one site. Would this be raw data or could it be transferred into a template similar to what the supermarkets already have set up?

Thanks for your input

MichaelG
4th May 2009, 09:58
It all depends on what you data you want and what you want to do with the data.

You might consider doing a dynamic scrap - where users search for a product on your website, your script visits all 3rd party websites to search for the product and get the price matching the product name.

Any developer working on this project, is likely to charge you per hour for the work, because getting it right be each 3rd party website will not be easy.

You might try writing to the websites for a product feed.

Astaroth
5th May 2009, 13:41
The scraping process simply creates a dataset and it is up to you what you then do with the data/ how you present it to your visitors.

It is unlikely to be "illegal" however you do have to ensure that you are not creating so much traffic to the retailers site that it verges on a potential denial of service attack. That said you would need to read the terms and conditions of each site to check if there is any useage limitations and I guess a chance of copyright issues but as has already been said - check with your solicitor for formal legal advice.

http://www.mysupermarket.co.uk appears to be offering a similar service and so it clearly is doable (I dont know if the regional variations in prices instore are replicated onto the websites of supermarkets).

You are persumably going to be contacting the supermarkets to get affiliate links to their sites and therefore monitorise your venture, why not ask them for their permission then and also if there are webservices or XML feeds you can connect to to save scraping?

As has been pointed out, scraping works on the principle of finding identifyable points in the page to locate prices etc and therefore if a site you are scraping amend their design/ templates your scraping may now fail until the process is updated by your developer. Again in an ideal world you would want heads up from the website owners of planned future changes so that you dont wake up one morning to your site being down and you then have to urgently get your developers back in to amend their coding.

Velrajan
23rd August 2009, 15:38
Hi,

I can help you on data scraping.I can scrape any website.

Many Thanks,
Velrajan T.

Place of design
23rd August 2009, 19:22
Hi,

I can help you on data scraping.I can scrape any website.

Many Thanks,
Velrajan T.

You keep saying this, but won't answer any questions about it when people probe you on it

If you are going to work with people, you need to be upfront with who you are and what you are doing

Velrajan
23rd August 2009, 20:57
Apologizes for not mention anything abt me,Basically am a software developer working for a big finance firm in Bournemouth.I have been doing lot of freelance projects for my customers thro' the freelance website RentACoder.Basically am much interested to work on scraping projects.

Please have a look at the testimonials (from my signature) i have received for the work i have done for my customers.

You keep saying this, but won't answer any questions about it when people probe you on it

If you are going to work with people, you need to be upfront with who you are and what you are doing

edmondscommerce
24th August 2009, 11:29
Yes screen scraping is possible, widely used and legal. What do you think Google does?

That said there are ways and ways of doing it and it is certainly possible to create a polite or an impolite scraper.

As for the cost, fisicx is way off the mark.

For a robust and reliable automated system with a nice web front end for the customers and a general site design etc I would quote in the region of £10k

mattk
24th August 2009, 12:44
Screen scraping is a very old fashioned technology. Not only would you need different rules for every site you wanted to scrap, but as soon as one of those sites change their look and feel you'd have to re-write your scraping rules.

Astaroth is right. By far the best way to acheive this is to get the supermarkets to supply you with data via XML or a Web service. This makes it very easy for you and is relatively straight forward for them.

edmondscommerce
24th August 2009, 14:28
true enough, but I tend to find that the extra content you can get from screen scraping generally makes it worthwhile.

feeds are good though and definitely worth trying as a first port of call

googol
24th August 2009, 17:32
I know Tesco have a SOAP API (currently in CTP, a beta is coming soon) and are aiming to have a REST API soon as well.
This allows you to grab data from their grocery service with XML feedback, and they have some sample code in C# (I converted it to an ASP.NET app from a client app pretty easily).
However at the moment you're not allowed to use the API for price comparison, however Nick Lansley (Head of R&D at Tesco.com) mentioned that they were likely to remove this subscription once the beta API is available on their new grocery platform (code-named project Martini).
I believe this is due to limitations in the current API and grocery platform where the prices aren't live prices from the mainframe for a particular store.

Information on the API can be found at https://www.lansleytech.com/tescoapiweb/reference.htm and http://www.techfortesco.com/forum/
There is an affiliate scheme open to all (£5 per new Tesco.com customer) and one only available to people who attended the launch event at the Microsoft building in London (10p per basket submitted through the API) which could be opened up should the initial £10,000 trial be successful.

Rienne
24th August 2009, 17:42
The vast majority of travel providers do not use screen scraping any more and haven't done for many years. One of the reasons was that each time a session logged on it would take a user login, and it became expensive to provide enough ports to satisfy the demand. It also went out of date too quickly. They all use XML now, and most content tends to come via a few data aggregators who can also provide boking services. Ryanair have a policy that they want all bookings done via their website only, so do not provide their data and they get very - very - picky if anyone tries to scrape it.

edmondscommerce
24th August 2009, 18:45
in my core field - there are a lot of ecommerce businesses whose suppliers might be partly ecommerce savvy (they have a website) but are not prepared or even informed enough to consider supplying their products as a feed.

in these circumstances it is perfectly reasonable to set up a scraper to grab the content.

chances are the site will not change very regularly at all in any structural sense so the usual problems with scrapers is not an issue.

The only other option in this kind of scenario generally involves a lot of copying and pasting, messing about with PDF pricelists and generally struggling to get the required data.

That said there is a new approach for product data coming via an aggregated XML source. The best example is the ICEcat project.

This is a combined open source or paid system that provides really good quality information on a huge range of products. So far though there is a definite technology products bias so it is only useful for certain businesses.

Eventually though yes this kind of thing will become the norm, but I don't imagine that will come any time too soon. In the meantime screen scraping will continue to serve a purpose and provide a competitive advantage to those retailers who are savvy to the technique.

techsmart
3rd November 2009, 04:57
All replies above are on the right track. Screen scraping is legal and can be done to extract data from web. You can do it through programming or through any software or tool. I do web data extraction using a software only because that is pretty good tool to do that. I use Automation Anywhere. It's nice.

You can try it by downloading the trial by google-ing the name "Screen Scraping by Automation Anywhere".

Probably this will help you.

Enjoy!