Question for the Blackhats on encoding

UKSBD

Moderator
  • Dec 30, 2005
    13,034
    1
    2,833
    Any blackhats here who want to share their views on encoding in articles, copied content, etc.

    I see a lot of people do it on stop words but does it really do any good?

    What benefits (if any ) do they actually get from things like this;
    www . tailormadehawaii .com / helicopter-tour-kauai-hawaii
     

    gpietersz

    Free Member
  • Business Listing
    Sep 10, 2019
    2,765
    2
    733
    Northwhich, Cheshire
    pietersz.net
    Not a black hat, but this got me interested.

    The only related technique I can find involves substituting letters (e.g. replacing a latin letter with a Cyrillic one that looks very similar). This means search engines do not realise text is duplicate content but people can read it.

    However, it also means search engines think the page is gibberish.

    In the case of this site, it looks more to me as though someone has messed up as its unreadable.

    I might be wrong and there may be some technique I have never heard about.
     
    Upvote 0

    Clinton

    Free Member
  • Business Listing
    Jan 17, 2010
    5,748
    1
    3,068
    ukbusinessbrokers.com
    There are SEO techniques that work and SEO techniques that don't work, nothing to do with hats.

    Focus on what works, not what Google tells you to do. They are two very different things!

    And most so called SEOs don't know the difference. On their websites these SEOs often drone on and on and about how they only use whitehat techniques. The message they want to convey, I suppose, is that they are good guys.

    What they are actually saying is that they don't really have a clue. They've read one or two pages on Google's site, watched a Youtube video ...and now consider themselves experts in SEOs.

    The best SEOs test stuff, work out what's getting results, and implement that. They're not going by some stupid rule book put together by the Gorg to manipulate the naive.

    This encoding you talk about is used to make text look different, ie you copy text from somewhere and modify or "spin" it to post it elsewhere so Google's bot recognises it, mistakenly, as unique content. There used to be lots of online tools that helped with spinning of articles.

    And there are a lot of these gibberish articles out there in the wild today. You have no doubt come across one or two of them. (Sometimes I read posts in UKBF and they make so little sense that my first suspicions are that it's spun content!)

    Spinning / encoding used to work for SEO. But that ship has sailed. It sailed almost a decade ago!

    But there is a different meaning to encoding - the declaration in the head (or .htaccess) of the character set you're using. For example, UTF-8.

    <added> Stop words?! Stop worrying about stuff like that.
     
    Last edited:
    Upvote 0

    gpietersz

    Free Member
  • Business Listing
    Sep 10, 2019
    2,765
    2
    733
    Northwhich, Cheshire
    pietersz.net
    Focus on what works, not what Google tells you to do. They are two very different things!

    And most so called SEOs don't know the difference

    I agree, but there is a difference between doing things Google does not like, and outright spammy things like spun content.


    The best SEOs test stuff, work out what's getting results, and implement that.

    Something that is reflected in their charges! Unfortunately some snake oil salesmen charge the same.

    Also, what works short term vs what works long term.


    Sometimes I read posts in UKBF and they make so little sense that my first suspicions are that it's spun content!

    I think most of it is spammy people.
     
    • Like
    Reactions: Clinton
    Upvote 0

    Clinton

    Free Member
  • Business Listing
    Jan 17, 2010
    5,748
    1
    3,068
    ukbusinessbrokers.com
    I agree, but there is a difference between doing things Google does not like, and outright spammy things like spun content.
    That bit I don't quite agree with.

    There's what works and what doesn't work. As simple as that.

    If spun content works, use it. When it worked people used it on burners (disposable sites) and buffers / satellite sites designed to channel and direct "PR" to the money site or to attract visitors from Google to whom they served different content to what they were serving the bot (cloaking).

    It's simply a matter of knowing how to use the technique.

    Heck, I know sites that were competing with mine back in the day used tactics like this one on money sites and, stupidly, I thought Google would do something about it, but they didn't.

    Google don't like to take manual action. If people are using a certain exploit, Google needs to find an algorithmic way to deal with it. And, guess what, sometimes it took them years!

    So, way, way back in time if you remember the 301 issues (subsequent to which the world and his dog learnt about canonicals), Google took 2+ years to fix the exploit of buying high PR sites and doing a 301.

    Don't worry about the BS Google pumps out regularly. Don't worry about building "quality sites". Just do what frigging works :) . If I were still in that game - running various sites reliant on Google traffic - that's what I'd do.
     
    • Like
    Reactions: gpietersz
    Upvote 0

    UKSBD

    Moderator
  • Dec 30, 2005
    13,034
    1
    2,833
    do you know how to spot which encoding is being used?

    For instance hеlр уоu how would you convert that back if you don't know what it was encoded in?

    edit to add: I'm not just talking about making words look different but actual encoding
     
    Upvote 0

    gpietersz

    Free Member
  • Business Listing
    Sep 10, 2019
    2,765
    2
    733
    Northwhich, Cheshire
    pietersz.net
    You might have to do something like download it as raw bytes and then try different encodings.

    There is a Unix command line utility called file that can identify encodings, but its not reliable at the best of times, and this stuff is mixed encoding (the HTML markup is uft-8) so I doubt it will be useful.

    You could download, open in a text editor that lets you set the encoding and try changing the encoding/
     
    Upvote 0

    Clinton

    Free Member
  • Business Listing
    Jan 17, 2010
    5,748
    1
    3,068
    ukbusinessbrokers.com
    Encodings are lossy ie. data is lost in the encoding so an encoding string doesn't lend itself to a reversing process. That is complicated by the fact that mapping isn't a straight exchange. Further, if there isn't a direct replacement character the encoding process uses a substitute.
     
    Upvote 0

    gpietersz

    Free Member
  • Business Listing
    Sep 10, 2019
    2,765
    2
    733
    Northwhich, Cheshire
    pietersz.net
    Encodings are lossy ie. data is lost in the encoding so an encoding string doesn't lend itself to a reversing process.

    Just for clarity, normal text encoding is not lossy, the process used for article spinning looks like it is but more because of unpredictable mapping. If you could identify the character encoding you could get the mapped version back,
     
    Upvote 0

    UKSBD

    Moderator
  • Dec 30, 2005
    13,034
    1
    2,833
    It's all a bit technical and beyond me, but I edited a listing in my directory and the database converted it somehow.

    I tried it using the normal edit forms I use and it just added it to the page as it was.

    I then tried it running a query using MySQL Workbench and it converted it
     
    • Like
    Reactions: Clinton
    Upvote 0
    This is quite a helpful and complete answer:
    https://unicodelookup.com/#We can hеlр уоu/1

    It has URL encoded the Unicode characters and should give you the full answer as to what the text is viz:

    W latin capital letter w 0127 87 0x57 &#87;
    e latin small letter e 0145 101 0x65 &#101;
    space 040 32 0x20 &#32;
    c latin small letter c 0143 99 0x63 &#99;
    a latin small letter a 0141 97 0x61 &#97;
    n latin small letter n 0156 110 0x6E &#110;
    h latin small letter h 0150 104 0x68 &#104;
    е cyrillic small letter ie 02065 1077 0x435 &#1077;
    l latin small letter l 0154 108 0x6C &#108;
    р cyrillic small letter er 02100 1088 0x440 &#1088;
    у cyrillic small letter u 02103 1091 0x443 &#1091;
    о cyrillic small letter o 02076 1086 0x43E &#1086;
    u latin small letter u 0165 117 0x75 &#117;


    Not sure how this helps for SEO though.
     
    Upvote 0

    UKSBD

    Moderator
  • Dec 30, 2005
    13,034
    1
    2,833
    Thanks, but I've seen all these Unicode look up and conversion tools

    It's identifying the encoding that seems more tricky

    I can see that hеlÑ€ уоu converts to hеlр уоu but no idea what hеlÑ€ уоu is in the 1st place and also what hеlр уоu is (it's not the same as help you)

    I'm intrigued why people do it, I know it's related to article spinning but can't see how they benefit from it nowadays.
     
    Upvote 0
    It's identifying the encoding that seems more tricky
    I am not sure why the encoding should matter.

    I am using Chrome. F12 gives developer tools. Network gives all the network transactions (http or https normally). The two key headers are:

    Content-Encoding: gzip
    Content-Type: text/html; charset=UTF-8

    That tells us that it is first encoded into UTF-8 which is a good character set to use to have unicode and then compressed using gzip (which saves transmission bandwidth).

    My browser says it will accept gzip and it probably could send it without compression otherwise.

    The encoding matters in that if it is not using UTF-8 (or another unicode encoding) then you won't get the full unicode character set. However, beyond that I am unclear what the issue is.
     
    Upvote 0

    gpietersz

    Free Member
  • Business Listing
    Sep 10, 2019
    2,765
    2
    733
    Northwhich, Cheshire
    pietersz.net
    I think I know that is happening. They are trying to spin this, but messing it up.

    They are substituting characters as described, getting UTF-8 encoded output.

    They are then sorting it in a latin1_swedish_ci MySQL table.

    However, they then serve this as as UFT-8 without actually converting it. So the text is not readable as the browser tries to render it as UTF-8.

    If you look at the other pages of the site, they have the same problem throughout. On pages that are not spun its just the odd character that is wrong. That is because both latin1 are supersets of ASCII so only non-ASCII characters go wrong (it might actually be ISO 8859-1, they are supersets of, but that is a superset of ASCII anyway).

    It might be complicated a bit by the fact that UTF-8 can have more than one byte per character, whereas the others are strictly one byte per character, but that is essentially what is happening.

    On the spun page its mostly non-ASCII (or ISO 8859-1) characters so they are mostly wrong with the odd bit readable. Some non-ASCII characters may look like the start of UTF-8 multibyte sequences throwing things out further.

    When you copy and paste into your system its being stored as latin1_swedish_ci so DB queries display it as such, and your website converts it to UTF-8 which is then readable. The test page on your website is correctly doing what their system should do.
     
    • Like
    Reactions: UKSBD
    Upvote 0

    UKSBD

    Moderator
  • Dec 30, 2005
    13,034
    1
    2,833
    I think I know that is happening. They are trying to spin this, but messing it up.

    Yes, I assume they are using some article spinning software to do it, don't know which one though.

    It looks like it changes certain letters rather than words though.

    Even if they had done it competently I am not sure how it helps.

    It probably goes back to the article writing days and was a way of churning out 100's (or 1000's even) of versions of the same article without having to manually write them.
     
    Upvote 0

    gpietersz

    Free Member
  • Business Listing
    Sep 10, 2019
    2,765
    2
    733
    Northwhich, Cheshire
    pietersz.net
    @UKSBD, yes, definitely automated. If you search Black Hat World there are mentions of software that does that.

    However, I do not see it helps the way article spinning by changing words might. If the search engines can parse it, they will recognise they as duplicates. If they cannot they will see it as gibberish which cannot be much good for SEO either. It might make a site look bigger and with a different internal link structure, but it will also make it look like it has a lot of junk (at best lots of text in an unidentified language) on it.
     
    Upvote 0

    StevePoster

    Free Member
  • Nov 29, 2013
    1,354
    149
    Philippines
    Google understood context (and synonyms) long before BERT.

    Is that your only idea on what BERT does?
    Your statement is already given since then. What i need to know is your side about BERT.

    An contextual presentations of many words that based on an in-depth of neural network.
    That's how BERT works. Its more on perceiving the context of words in search queries
     
    Upvote 0

    Latest Articles

    Join UK Business Forums for free business advice