Web scraping match results

Trukerz

Free Member
Apr 1, 2019
2
0
Hello everyone,

I write you in regards to an idea of personal project I have in order to learn how to program a webscraper. A web scraper is a script/software reading web pages (as a visitor) with a browser. The robots of Google are an example of web scrapers : they visit each page, they "read" their content and index them.

In my case, I would like to web scrape the match results (team 1, team 2, goals, date, league) of a sport betting website and put them into an Excel file in order to make some statistics and to present them on my personal blog.

My program could surf on the pages at non disruptive rate, like 1 page every 5 seconds, in order to avoid to "damage" the server. The use of the collected data won't be for business purposes. My source code (for the webscraper) would be open source and the produced charts & statistics as well.

Moreover, I suppose that sport match results are "public" data on the one hand (but I don't know where take them from), and I didn't find a visitor/user agreement on the website on the other hand. The only legal notices I could read were in German on the forum as the "terms of services" and don't seem to mention explicitly the property of those data (match results). The company is based in the UK (in London).

Therefore, do you think it is completely legal to do my personal hobby project ?

Thank you in advance for your time. Have a nice day.

Best regards,
 
E

Engage Legal

My initial thoughts are that you would probably get away with it because it it for personal use. Alternatively/as well, if there is an issue with that, you could also note the source of the data (a small box at the bottom saying "data sourced from X" would probably suffice).

Happy to be corrected if any IP experts are on here.
 
Upvote 0

Alan

Free Member
  • Aug 16, 2011
    7,089
    1,974
    Depends on your definition of illegal. If you mean criminal law, then probably not, if you mean civil law then probably.

    You need to read the terms carefully ( even if they are in German ).

    For example here is an extract from terms of use of a betting site, and I'm sure they will probably all have the same concepts ...:
    .
    "Screen scraping, web scraping or any other automated or manual collection of xyz Data, for commercial or personal use, by any person is expressly prohibited."
     
    • Like
    Reactions: fisicx
    Upvote 0

    Inva

    Free Member
    Aug 10, 2018
    370
    62
    Is prohibited does not mean squat. It's only there as a scare tactic. Game scores are public knowledge and therefore you can't copyright them as you can't copyright song lyrics for example. Even these companies buy them from somewhere else.

    They can't do much except try to block you, but tbh unless you are scraping historical data, your scraping's impact on the website is that of a fly on a dinosaur.
     
    • Like
    Reactions: Clinton
    Upvote 0

    Clinton

    Free Member
  • Business Listing
    Jan 17, 2010
    5,748
    1
    3,068
    ukbusinessbrokers.com
    "Screen scraping, web scraping or any other automated or manual collection of xyz Data, for commercial or personal use, by any person is expressly prohibited."
    Tell that to my bot and see if it cares! :)

    @Trukerz , there are a lot of easy scraping tools out there and for the simple scraping you're looking to do they'll probably be fine. You'll also have the advantage of their IP rotation.

    But if you're doing this primarily to learn the programming involved, that's great too. Your scraping is completely legal, go ahead.

    You don't even need to worry about setting slow crawl speed. It's not your job to ensure their website keeps working. Their webmaster should be controlling crawlers requests, security, everything. If they've got someone not sufficiently clued up running their website ... that's not your problem.
     
    Upvote 0

    Inva

    Free Member
    Aug 10, 2018
    370
    62
    Throttling or limiting is sometimes necessary not for avoiding to overload the website (not gonna happen / who cares) but because some of these websites will have mechanisms in place to detect crawlers and will try to block you.

    In OP's case he will probably crawl less than 10 pages, a day, which i bet averages less than a normal visitor.
     
    Upvote 0
    Jun 26, 2017
    2,713
    1,012
    I wrote a program a few years ago that scrapes a whole host of data including football results, goals, cards, even shots on target and also prevailing odds. It then scrapes upcoming fixtures and applies a number of criteria to try and identify certain "opportunities"....
    I did this all using VBA in excel. Very easy to teach yourself how to do.
     
    Upvote 0

    Inva

    Free Member
    Aug 10, 2018
    370
    62
    I wrote a program a few years ago that scrapes a whole host of data including football results, goals, cards, even shots on target and also prevailing odds. It then scrapes upcoming fixtures and applies a number of criteria to try and identify certain "opportunities"....
    I did this all using VBA in excel. Very easy to teach yourself how to do.
    Am doing the exact thing now for a client. Only not in excel, actually the whole point for him is to get rid of spreadsheets and automate the thing. How did it work out for you? Found any deviations? ;)
     
    Upvote 0
    Jun 26, 2017
    2,713
    1,012
    Am doing the exact thing now for a client. Only not in excel, actually the whole point for him is to get rid of spreadsheets and automate the thing. How did it work out for you? Found any deviations? ;)

    It has worked out extremely well for me over the years. I have always thought about writing a bot to just automate the whole thing, but all in all it takes me like 5 minutes a week to run the thing and put on the various selections across the top 5 European leagues. Not restrictive in any way.
     
    Upvote 0

    fisicx

    Moderator
    Sep 12, 2006
    46,782
    8
    15,426
    Aldershot
    www.aerin.co.uk
    Upvote 0

    Trukerz

    Free Member
    Apr 1, 2019
    2
    0
    Thank you all of you for your answers ! It helps me a lot !

    I just checked the terms of use : it actually doesn't seem to set rules about the portal where the results are displayed; it is more about the use of the forum. Nothing is explicitly written against web scraping/crawling (with those words or equivalent ones at least).

    PS : I actually need historical data, that's why I don't use API (probably not free for a such use) and I would be careful with the loading page rate.
     
    Upvote 0

    Latest Articles

    Join UK Business Forums for free business advice