PDA

View Full Version : piin.it framing content?


UKSBD
13th May 2009, 13:54
Anyone know anything about this site,
http://piin.it/

They appear to be framing other sites and changing URL's thus creating
duplicate content.

FireFleur
13th May 2009, 14:04
Framing doesn't create duplicate content, the URLs are the same.

What people could object to is passing off or copyright infringement, but that is normally if you are in the same sector or if someone really objects.

A frameset is:

URL0

<frameset>
<frame
src = "URL1"
>
<frame
src = "URL2"
>
<noframes>
This is attributed to URL0 and then not particularly well
</noframes>
</frameset>

The problem with frames is the search engines tend to index on URL not page as we see it. And if the src is going off domain to the container there is even less chance it will be directly associated with the holding frame.

It is more similar actually to be linked to, but of course the content is shown which is why people object.

UKSBD
13th May 2009, 14:38
try a search for

http://www.google.co.uk/search?sourceid=navclient&aq=0h&oq=si&ie=UTF-8&rlz=1T4GGIC_en-GBGB266GB266&q=site%3ahttp%3a%2f%2fpiin.it

FireFleur
13th May 2009, 14:45
They are using an iframe so:

url0 :

<title>url0 index</title>
<body>
url0 index
<iframe
src = "url1"
>
url0 index
</body>

The bone of contention is probably going to be the title in this instance.

I am on one which says CNN, and that is being indexed but the content is not being given to url0 for whatever is at url1.

So it is not duplicate, but the way they are doing it could be seen as passing off.

There was a landmark case in the UK with a couple of newspapers one which framing the content of the other.

For duplicate content, they would be proxying, and embedding the actual content.

Edit:

Hmm there is something else going on, I cannot open the iframe by itself.

It could be proxying using the iframe as a bit of a ruse, the iframe is not src'd the content is injected using javascript, no frame needed a div could have been used.

Now google is beginning to index javascript generated pages, and if the information is embedded then it could create duplicate content.

The frame doesn't by use of src, this could.

FireFleur
13th May 2009, 14:59
If you do a search using site: as you have said but add some of the text not in the title, but close to the top.

so site:example.com flames shoot

it doesn't find any results, but that is not to say it won't in the future.

The way the content is injected would have to be looked at to find out what is going on, but let me run a tcpdump and see if I actually communicate with the sites who content appears.

Doesn't look like it, so I would guess at embedding and they will probably bring the ire of a few people, even though at the moment it is not creating duplicate except on title in search engines, but I think it may be contrary to copyright.

FireFleur
13th May 2009, 15:06
Ok, what they have done is either this:

proxy or datastore -> javascript -> display

or

site -> javascript -> display

The former could create duplicate content, pointed at the datastore, which would be a url that we could see and so search engines as well.

The second is fine, as far as duplicate content goes because the URL will be referenced somewhere.

It is odd they don't use the frame src though, it looks like they are into javascript and using iframe as being semantic, but it is hard to tell exactly what is going on because of that. So good intention, well they are making it hard to open the iframe into its own page.

The best way to describe this if they are making a copy and then referencing that copy as not belonging to the original owner, then it can create duplicate content issues, if they are referencing a url of original content, in real time on the client side (that's important) it is more similar to just linking.

FireFleur
13th May 2009, 15:39
I have just run another tcpdump on a clear cache, and I cannot see any connection to CNN, so if I had to bet and the bet was small, I would bet on a proxy and store being used to deliver the content.

Which is odd though because there is no need, the big sites will pick up on this soon enough if they are doing it, I suppose framing the content does alert sites to this being done, but it is not as detrimental as creating a scenario of duplicate content, and you will alert them anyhow if you climb high enough on the content.

I imagine they will just ensure it does just frame, and allow for the frame to be removed by the browser and in their interface strip, that tends to keep people happy, but if they grow they will run into the similar problems google has, where the content people will want a slice of the action.

UKSBD
13th May 2009, 16:28
There is more to what they are doing than initially meets the eye.
Pages are getting indexed with different URL's try looking at the cache of
some of the pages quickly, before they direct to an error page.

try some searches on google.it (pages from Italy)

here is an example,
http://www.google.it/search?hl=it&q=Visit+the+BASCAP+homepage&btnG=Cerca+con+Google&meta=cr%3DcountryIT

look at the cache, stop the page before it redirects and look at the source.

If they start doing this to weak site they could effectively hijack them.

FireFleur
13th May 2009, 16:44
What they have done either by design or circumstance is created an obfuscated embed.

The final URL which is accessed via the client determines if the content is being copied or transmitted from live.

Framing doesn't create duplicate content, duplicate content creates duplicate content. You can have a frame pointing to duplicate content or the original.

Some browser tools may just report the first step, but a search engine will be interested in the last and the chain of it all.

Personally I think the title copying is enough to annoy people, it could also be in meta tags as well. But as soon someone else's url is copied verbatim (or in substantial part) and published under another url that is copyright infringement unless the license allows for it.

So they normally go for a while and then get hit with a DMCA, at which point they have to do it right, and framing itself using src to the original source (have to be more accurate in this instance) is already problematic, because it can interfere with brand.

But with that said, if you write a plugin or a browser that would be acceptable, it is about the client side wanting it. So, if you make it very clear that you are not obfuscating the source and the surrounding is not defamatory or competitive you can get away with it. But a well protected brand would be on it.

UKSBD
13th May 2009, 16:49
The bigger, powerful sites probably won't get effected because their
pages will outrank the other version, will the same happen for smaller
weaker sites though?
Deliberate hijacking?

FireFleur
13th May 2009, 17:05
Well I wouldn't be so sure the bigger sites won't see this.

There is also the alerts system isn't there, and if someone is using that for each of their stories and the titles that will get flagged as google picks up the title.

And people do go deep in searches anyhow when checking their own stuff out.

Every now and again Google blips, and sites which are right down on the SERPS have a little moment of glory, it is fleeting but it is enough to get known (I am not going to argue the toss on this if people want to believe that doesn't happen then fine :) ).

So there is two possible ones for finding out and there are more, such as very directed searches using quoted strings to force a definite sequence to only match.