Web site scraping... Are you doing it?

movietub

Free Member
Nov 6, 2008
4,858
1,106
Over the last few years I have been playing with various methods of site scraping. We have been trying to perfect a system of scraping competitor prices, then comparing them to our own.

Now the system is almost fully automated, right up to suggesting changes (for approval before going live!!) and can even display competitor price on our product pages if they meet certain criteria. We don't just use it to undercut - more often than not it highlights products that we sell way to cheap, way cheaper than we need to in order to stay competitive.

I have also found the technology very useful for bulk adding large new product ranges - it's amazing how many of our suppliers can't give us a CD with all the image names and codes for their products, but they do haver the data on their sites. Normally it's because their back end is a mash-up of different systems and they simply can't extract their own data in any clean way.

I just wondered if others make us of site scraping? I can't see how any ecommerce seller can keep on top of their pricing once they have 1000+ products to monitor and 10 competitors to keep tabs on. But most others I speak to in the industry have 1 or 2 staff paid decent money just to do that.

Before, we used to frequently find that all our competitors had raised prices and we had been selling a product cheaper than need be for a long time before we got round to checking that particular product.

Those who do it - I would love to hear from you. Those that don't, just think of the potential!
 
  • Like
Reactions: SFD

movietub

Free Member
Nov 6, 2008
4,858
1,106
If I was a developer of one of those sites I think I would be tempted to feed you false information. ;)

Regards

Dotty

That's almost impossible, as the scrape views live data on the public site... We only gather the information they want their visitors to see, as they want them to see it. Of course they could relatively simply change the way the data is delivered requiring us to modify our scrape algorithym...

But that's the thing - no one ever has done! I'm forced to assume that they simply don't have a clue how these things work.

By the way, I said the system was *nearly* fully automated. We keep certain steps under human eyes just in case we do get funny results. It does happen of course.
 
Upvote 0

movietub

Free Member
Nov 6, 2008
4,858
1,106
Not very easy to do, all you have to go on is the IP of the request so the easiest option if spotted is to block the IP.

EDIT: Actually it's very easy to do, good idea :D

That would be possible, and I agree the easiest method. But I would say the value to us of knowing where we are in the market place would certainly make it affordable to use a whole host of IPs.

But as I said before, people don't do that. I don't think they know how in most cases, or even understand how scraping works and how to make it more difficult.

In reality we tend more and more to monitor third party services such as Google base (which incidently makes scraping a lot easier). Of course, the person submitting to these services has no way of blocking access.

Of course all online retailers site scrape, it's just that most do it the mushy organic way and do it a lot more slowly!
 
Upvote 0
My first thought was it would be difficult but actually it's really simple

Replace the usual:

PHP:
<?php echo $price; ?>

to

PHP:
<?php if ($_SERVER['REMOTE_ADDR'] == movietubs_ip)
echo $price * 0.9 // False price for you
else
echo $price;
?>

They would have to spot what was happening in the log files though to do this. A few random checks by yourself can monitor for this.

If you are scraping Google shopping results then like you say it's fairly fool proof.
 
Last edited:
Upvote 0

movietub

Free Member
Nov 6, 2008
4,858
1,106
It doesn't matter if it's 'data on the public site'.

If you could spot a pattern you could respond with false data to that client only.

Regards

Dotty

Yes but only if you can identify the client, under whatever IP the request actually comes from. And only if they are viewing your site directly, and not a feed you have setup to a third party service.

Besides, it's not as if I would care if we got scraped back by one of our competitors. It's in all of our interests to be relatively close price-wise, and cheaper on certain lines when we can afford to be for whatever reason.

It's like a local butcher knowing what price his competitor sells sausages at - of course they know! Online, all competitors are effectively local, so we need a 21st century solution to achieve the same awareness.
 
Upvote 0

movietub

Free Member
Nov 6, 2008
4,858
1,106
My first thought was it would be difficult but actually it's really simple

Replace the usual:

PHP:
<?php echo $price; ?>
to

PHP:
<?php if ($_SERVER['REMOTE_ADDR'] == movietubs_ip)
echo $price * 0.9 // False price for you
else
echo $price;
?>
They would have to spot what was happening in the log files though to do this. A few random checks by yourself can monitor for this.

If you are scraping Google shopping results then like you say it's fairly fool proof.

As I said:

1) Most either don't know how to add a conditional statement, or their hosted solution does not make such possible.

2) Long before it became a deterrent we would simply approach the site indirectly or from multiple sources. Thats the thing about automation, it's possible to replicate for very little cost/effort.

3) Most of what we scrape isn't on the users own domain/server these days.

But yes, you're basically right. And it's the first thing I would do as it would often work.

EDIT: Of course blocking the IP altogether would be no less effective... But I admit it would be more fun to try and mess up our game ;)
 
Last edited:
Upvote 0

movietub

Free Member
Nov 6, 2008
4,858
1,106
First thing I do in the morning is go through my log stats! (sad I know but it often reveals 'lost bandwith' that is easily recovered with a bit of tweaking).

Regards

Dotty

You can set alarms through most analytics programmes to alert you too repeat requests from a single IP. You could automate and have your morning back!

As you have probably observed though, the vast majority of such requests made are either harmless, short term or actually beneficial - directories etc.
 
Upvote 0
...You could automate and have your morning back!...

True but I personally think a GOOD understanding of your stats is invaluable.

And you are right most automated requests are harmless (even what you are doing!) but every now and a again you get a REAL problem.

e.g recently had some major problems with 'referer spam' the effects of which were verging on a DOS attack. This was only countered by using a filter based on the Http Referer Header.

Regards

Dotty
 
Upvote 0
I

I Love Spreadsheets

Isnt there a copyright issue here as well?

They maybe making their prices available to the public to read, but it doesnt mean it gives you the right to use their data from their website commercially for your own site.

If I went to say Argos's website and started downloading their product images to use on my website, wouldnt I be infringing on their copyright? Doesnt the same thing apply to their pricing and their pricing data?

Its one thing to take a look and compare your prices, but isnt it different thing to use it on your own site?

I'm interested because I have personally turned down work to do this kind of thing because i cant square it in my own head
 
Upvote 0

movietub

Free Member
Nov 6, 2008
4,858
1,106
Isnt there a copyright issue here as well?

They maybe making their prices available to the public to read, but it doesnt mean it gives you the right to use their data from their website commercially for your own site.

If I went to say Argos's website and started downloading their product images to use on my website, wouldnt I be infringing on their copyright? Doesnt the same thing apply to their pricing and their pricing data?

Its one thing to take a look and compare your prices, but isnt it different thing to use it on your own site?

I'm interested because I have personally turned down work to do this kind of thing because i cant square it in my own head

Ever seen the TESCO adverts comparing prices to those of ASDA? Waitrose currently shows the main 3 supermarkets prices on their shelf labels for many products they have recently price matched. You can't tell people not to look at the competitions prices, that would make filling out any business plan impossible for one!

Images are different of course. I would never scrape images or content from another site without asking first.

But prices, that's very different. A price is not something that can be copyrighted (imagine how expensive everything would be by now if it could be...). A price is in the public domain, it is a statement of the value of your products. So is 'cheapest price promise' and so on. It's very much in the public interest that prices be compared and challenged. And thats key here really, public interest and public domain. Thats why we have laws against monopolisation that force competition to exist and survive.

Google it if you wish!
 
  • Like
Reactions: eog
Upvote 0
I

I Love Spreadsheets

I'm not saying dont look at competitors sites and do your research for your business plan etc, but what we are looking at here is taking the information and republishing for commercial gain. Tesco's and Asda's dont normally publish each others prices they say things like "We are cheeper than Asda's on 10,000 items" or "Your average shop will be 10% less than Tesco's" - I cant recall seeing any adverts with side by side prices but my memory is not what it used to be LOL. I can recall seeing it on the shelf pricing though.

I know its impossible to copyright the price £4.99 for instance, but isnt the whole web page covered by a copyright and the £4.99 element is just a part of that? As you can probably guess copyright is not my strong point

BTW the jobs I refused were to scrape prices, specifications and descriptions all produced by competitors. If it was just the prices I would have probably accepted the work but the descriptions etc was all content put together by the competitor
 
Upvote 0

movietub

Free Member
Nov 6, 2008
4,858
1,106
I'm not saying dont look at competitors sites and do your research for your business plan etc, but what we are looking at here is taking the information and republishing for commercial gain. Tesco's and Asda's dont normally publish each others prices they say things like "We are cheeper than Asda's on 10,000 items" or "Your average shop will be 10% less than Tesco's" - I cant recall seeing any adverts with side by side prices but my memory is not what it used to be LOL. I can recall seeing it on the shelf pricing though.

They definately do show each other prices on TV/web on occasion. Mostly it's a bit more subtle, but I imagine this is because being specific is a minefield when the competitor could change their prices a week before the advert airs. Also, it's often not seen as positive to blatently list your competitors current prices.

I know its impossible to copyright the price £4.99 for instance, but isnt the whole web page covered by a copyright and the £4.99 element is just a part of that? As you can probably guess copyright is not my strong point

You can't copy the entire page, or any particular chunk of it. But the individual elements of that page are only covered by copyright if the company owns the copyright for that element. So if they used their own photos, they have copyright by default. But not for their prices.
Their own text = copyright. If you create something, you own copyright on it. A price is not created, it is chosen.

BTW the jobs I refused were to scrape prices, specifications and descriptions all produced by competitors. If it was just the prices I would have probably accepted the work but the descriptions etc was all content put together by the competitor

It depends what they want it for. We have scraped competitors descriptions for internal use, which I believe is fine. For example, we want to have the best and most unique product descriptions, so we might scrape the manufacturers site, and three resellers sites and and ask our merchandisers to view all the collected info, and re-write it as one to make sure we have the most comprehensive and useful description on the web for a particular product. Nothing wrong with that.

However if we scraped 2000 product descriptions and uploaded them as our own, that would be really taking the p*ss!!!

Regards you being asked to do the work, I would say you are completely safe so long as you provide the technical solution and ability, but don't implement it yourself. Or at least not fully to the point of uploading copied data as if it were your clients.

The process of scraping a site is simply another way of loading it up and reading it through a browser - whatever you scrape is not illegal until it is mis-used. So long as it was intended for public viewing by the original creator that is.
 
Upvote 0

competitor_monitor

Free Member
Feb 13, 2011
14
0
Hi guys,

Noticed this thread and thought I'd tell you about my site, Competitor Monitor .com

I guess from the thread that some of you interested in this service. We have been doing it for some years now and offer the most competitive pricing on the whole of the Internet. Really is a great service.

Happy to answer any questions.

Haven't joined just to promote, just thought I'd let you guys know. Interesting topic.
 
Upvote 0

DesignerNick

Free Member
Apr 22, 2009
3,442
609
Coventry, UK
My first thought was it would be difficult but actually it's really simple

Replace the usual:

PHP:
<?php echo $price; ?>
to

PHP:
<?php if ($_SERVER['REMOTE_ADDR'] == movietubs_ip)
echo $price * 0.9 // False price for you
else
echo $price;
?>
They would have to spot what was happening in the log files though to do this. A few random checks by yourself can monitor for this.

If you are scraping Google shopping results then like you say it's fairly fool proof.

All well and good, but anybody who knows what they are doing would be using proxies to scrape and at random times so there would be no pattern.
 
Upvote 0

movietub

Free Member
Nov 6, 2008
4,858
1,106
All well and good, but anybody who knows what they are doing would be using proxies to scrape and at random times so there would be no pattern.

Quite. Anyone who expects such a crude measure to have any affect should pause to consider that site scraping is not straight-forward, and that anyone doing it is fairly techie themselves and unlikely to use a single IP.

Although I suspect most scraping these days is done via third party sites, which often originally scraped the information themselves. Or Google shopping of course. These places are a much easier way to scrape as the information is laid out neater fashion than on the sites it originated on, and it's mostly impossible for the competitor to monitor your page requests, let alone do anything to stop them.
 
Last edited:
Upvote 0

movietub

Free Member
Nov 6, 2008
4,858
1,106
I guess from the thread that some of you interested in this service. We have been doing it for some years now and offer the most competitive pricing on the whole of the Internet. Really is a great service.

How do we know yours is the cheapest? Via your scraping tools?? :rolleyes:

Just out of interest, is it a profitable sort of business? There arn't that many companies offering such services. Probably not least because most e-retailers don't even know it's possible to scrape prices.
 
Upvote 0

movietub

Free Member
Nov 6, 2008
4,858
1,106
Is that not price fixing...?

No, that's not what is commenly termed as price fixing at all. Price fixing mostly refers to an attempt made by a source supplier (ie a manufacturer) to ensure no-one sells below a certain set-level. Typically this level is set a little below what any high street shop is likely to want to sell for, so it becomes an 'internet rrp' of sorts. Usually with the threat of stopping supply if anone sells online for less.

That's totally illegal of course, and actually a very short sighted practice by the suppliers. But it's unpolicable and happens more than most would believe.

Price fixing causes us a significant problem, as it always ensures all our competitors are around the same price level at times. We use scraping to monitor such situations so we can react quickly when competitors eventually raise the price inline with new supplier prices, to ensure we do the same.

Competition laws cover sellers and suppliers at every level of course, but as retailers price matching is of course ok, because then it's a public price, and public interest laws take over from competition laws.

Simple! - But it's always good to ask if you're not sure ;)
 
Upvote 0

12ish

Free Member
Dec 30, 2010
103
21
United Kingdom
In which case I can really see the potential in making a system that does what you propose.
Selling it would be cake also, as any etailers found with prices way above /below everyone else could be targeted with friendly email letting them know that they could be charging more/less on certain lines.
Give them the details of one product line free.
Charge them to register to see all the information.
Of course scraping all the prices on the net would be fairly processor intensive.
Is this the sort of thing you were alluding to?
 
Upvote 0

movietub

Free Member
Nov 6, 2008
4,858
1,106
In which case I can really see the potential in making a system that does what you propose.
Selling it would be cake also, as any etailers found with prices way above /below everyone else could be targeted with friendly email letting them know that they could be charging more/less on certain lines.
Give them the details of one product line free.
Charge them to register to see all the information.
Of course scraping all the prices on the net would be fairly processor intensive.
Is this the sort of thing you were alluding to?

That is already the system we have up and running, hence the thread!

We only scrape our competitors prices, and I'm not looking to market our system - in any case it is built on bits of software that we didn't develop. But there are several companies which do sell the 'service' of price scraping. They don't scrape all internet prices, but I have no doubt some of them will pre-scrape certain high price flux sectors and demonstrate the use of their system to people selling in those sectors.

The truth is that anyone with an IQ above 100 and a genuine desire to keep pace with ecommerce related developments could set up their own system and tailor it to their specific needs. In fact, I would say that any online SME serious about growing needs to get serious about using their computer to do a lot of the leg work that is currently passed on to their staff to do. Price scraping is not high tech at all, it's simply the process of getting a computer to view a page 'as a human' and note the results. Anyone not learning to make full use of the data available to them (in bulk) over the next few years is likely to struggle against those that do.

Data scraping is everwhere online, even though only a fraction of a percent of online businesses do it. Not just prices, everything. All search engine robots are data scraping, billions of pages a day.
 
Upvote 0

12ish

Free Member
Dec 30, 2010
103
21
United Kingdom
Data scraping is everwhere online, even though only a fraction of a percent of online businesses do it. Not just prices, everything. All search engine robots are data scraping, billions of pages a day.

I need to learn this for a project I'm working on. I don't think I'm allowed to scrape stuff with my current host provider though so I'll have to set up a server. Yay more stuff to learn.

I sell SEO and online marketing, so while it is not useful to me really it may well be of use to clients.

I'm currently learning php and jquery. Is there anything else you would suggest might be useful to know that can be applied to this sort of thing? (I know jquery will not help with scraping) Someone recommended I learn perl.
 
Upvote 0

movietub

Free Member
Nov 6, 2008
4,858
1,106
I need to learn this for a project I'm working on. I don't think I'm allowed to scrape stuff with my current host provider though so I'll have to set up a server. Yay more stuff to learn.

I sell SEO and online marketing, so while it is not useful to me really it may well be of use to clients.

I'm currently learning php and jquery. Is there anything else you would suggest might be useful to know that can be applied to this sort of thing? (I know jquery will not help with scraping) Someone recommended I learn perl.

The more you know about how the web works, the better you will be at scraping. For no other reason than it's easier to capture data if you know how it is generated when as the page loads.

You don't need a server, all you need is a scraping program. If you can view a webpage, you can scrape a webpage/website/500 sites.

Google 'outwit hub' and read about it's features. If you understand what you read, you could be scraping before you know it - albeit very inefficiently for the first few attempts no doubt.

As for what languages you should learn... For scraping a working knowledge of HTML alone would do for 99%. But thinking of the future of the web in general, I think learn all languages you can! Speaking the inside language of the web brings enormous benefits for any online company, or in fact just anyone doing anything online that they want to do better than most others.

imo.
 
  • Like
Reactions: Esk247
Upvote 0
How interesting. I'd heard of the scraping technique before but of course, i prefer to stay on the whiter than white side.

However, you're not actually doing anything wrong. You're just saving yourself the time of viewing all of the competitor sites by making your website scrape and then i presume collate the information via a mysql database? or perhaps even a simple email print out?

Do you think that once the programme is up and running, it would be possible to create a print-out that is easily readable at weekly sales forecast meetings with clients? for instance, showing their websites top sellers matched against the competitors on price?
 
Last edited by a moderator:
Upvote 0

movietub

Free Member
Nov 6, 2008
4,858
1,106
How interesting. I'd heard of the scraping technique before but of course, i prefer to stay on the whiter than white side.

However, you're not actually doing anything wrong. You're just saving yourself the time of viewing all of the competitor sites by making your website scrape and then i presume collate the information via a mysql database? or perhaps even a simple email print out?

Do you think that once the programme is up and running, it would be possible to create a print-out that is easily readable at weekly sales forecast meetings with clients? for instance, showing their websites top sellers matched against the competitors on price?

Yes, in fact most scraped data is is probably saved as CSV for viewing in excel so that it may be used/compared/restructured for any number of purposes.

Printing the data for discussion is as simple as printing any other excel doc.

More likely you would setup an excel template to import the fresh data into each time. Excel would sort and tidy the data, and make any useful comparisons for you.

This is all about using computers to save time and handle bulk data after all.
 
  • Like
Reactions: Esk247
Upvote 0

competitor_monitor

Free Member
Feb 13, 2011
14
0
How do we know yours is the cheapest? Via your scraping tools?? :rolleyes:

Just out of interest, is it a profitable sort of business? There arn't that many companies offering such services. Probably not least because most e-retailers don't even know it's possible to scrape prices.

Hi there

Indeed there aren't that many companies out there doing it as yet, but from all those that focus specifically on monitoring prices and changes to product ranges, we offer the best pricing. Most companies dont even give away their prices, but we wanted to show some transparency.

Sure its profitable - what works so great is that almost all the people that sign up to the free trial realise the benefits right away, and keep it on each month.

Thats the problem, most e-tailers are in the dark about it, and its hard trying to explain things sometimes. The key is trying to get the message in front of them as opposed to hoping the stumble across us themselves,as there aren't many searches for our services.

I noticed from your posts that you've built your own monitoring programme... good for you!! Not tempted to roll it out?

James
 
Upvote 0

competitor_monitor

Free Member
Feb 13, 2011
14
0
In which case I can really see the potential in making a system that does what you propose.
Selling it would be cake also, as any etailers found with prices way above /below everyone else could be targeted with friendly email letting them know that they could be charging more/less on certain lines.
Give them the details of one product line free.
Charge them to register to see all the information.
Of course scraping all the prices on the net would be fairly processor intensive.
Is this the sort of thing you were alluding to?

We spotted that exact potential some years ago!

Good idea on the selling technique, however we have never actually done this - people seem happy enough to test out our free trial once they understand what the service is. This means they get to see the full potential of our service as opposed to just a snapshot.
 
Upvote 0

competitor_monitor

Free Member
Feb 13, 2011
14
0
How interesting. I'd heard of the scraping technique before but of course, i prefer to stay on the whiter than white side.

However, you're not actually doing anything wrong. You're just saving yourself the time of viewing all of the competitor sites by making your website scrape and then i presume collate the information via a mysql database? or perhaps even a simple email print out?

Do you think that once the programme is up and running, it would be possible to create a print-out that is easily readable at weekly sales forecast meetings with clients? for instance, showing their websites top sellers matched against the competitors on price?

All perfectly legal. With our system (competitormonitor .com) we give you access to an online dashboard whereby you can access the latest results, as well as deliver a weekly report to you informing you what prices and products have changed, in a very easy to read format.

What you ask at the end is definitely possible. We offer a 30 day free trial, no payment details required, so feel free to sign up!

James
 
Upvote 0

movietub

Free Member
Nov 6, 2008
4,858
1,106
Sure its profitable - what works so great is that almost all the people that sign up to the free trial realise the benefits right away, and keep it on each month.

Thats the problem, most e-tailers are in the dark about it, and its hard trying to explain things sometimes. The key is trying to get the message in front of them as opposed to hoping the stumble across us themselves,as there aren't many searches for our services.

I noticed from your posts that you've built your own monitoring programme... good for you!! Not tempted to roll it out?

James

Our system is a botch of others software, although it's a very well tuned botch these days! To be honest I never intended to become so involved. Checking prices regularly was a pain for us, so after a little research I found the bones of a solution and it seemed straight forward enough to setup. Once I proved the power of the data I invested more time, so we could not only collect more data, but make use of it much more quickly.

I swear, half the benefit of this is not the data itself at all, but the ability to react to mass data, moments after it is collected and change your own catalogue prices to stay relevant. No human can keep up with that sort of speed, especially not if you are repeating the excercise every single day.

We could resell it as a service it's true, and that's why I idly asked if it was an earner!!! But I'm in the position right now of having many things I have developed over the last few years that could be a business in their own right. And quite simply, I haven't got the time to add anything else to the list and do it justice!!

I have no doubt the whole industry will blossom as etailers get more tech savvy start to see the value we do in data and automation. So best of luck to you as you push forwards!
 
Last edited:
Upvote 0

competitor_monitor

Free Member
Feb 13, 2011
14
0
Our system is a botch of others software, although it's a very well tuned botch these days! To be honest I never intended to become so involved. Checking prices regularly was a pain for us, so after a little research I found the bones of a solution and it seemed straight forward enough to setup. Once I proved the power of the data I invested more time, so we could not only collect more data, but make use of it much more quickly.

I swear, half the benefit of this is not the data itself at all, but the ability to react to mass data, moments after it is collected and change your own catalogue prices to stay relevant. No human can keep up with that sort of speed, especially not if you are repeating the excercise every single day.

We could resell it as a service it's true, and that's why I idly asked if it was an earner!!! But I'm in the position right now of having many things I have developed over the last few years that could be a business in their own right. And quite simply, I haven't got the time to add anything else to the list and do it justice!!

I have no doubt the whole industry will blossom as etailers get more tech savvy start to see the value we do in data and automation. So best of luck to you as you push forwards!

Thanks :) Good luck with things too
 
Upvote 0

Latest Articles