The field of Search Engine Optimisation (also known as SEO) is actually a very complex subject indeed, yet at the same time, its very easy to optimize your website when you understand what Google is trying to accomplish “provide relevant results” which assures people keep searching Google search engine.
How Google came to dominate the search market is quite easy to understand “invest and keep investing in the infrastructure while investing in the best minds on the planet so that the algorithm which runs each time you search Google is simply the best. Consider what happens when you search Google.
As soon as you start typing in the search box, you get instant suggestions, yet, have you ever considered the complexity of that event? As in, from your computer you get connected to Google servers, where Google keeps all the information (resources (URL)) available on internet public domain. Meaning, when you search Google, you aren’t searching the internet per se, but rather searching Google’s cached version of the internet. Currently Google has indexed over 70 trillion web documents (and is growing with each second). Can you imagine the resource needed for such task? Have you ever seen Google search not available? But why do all this? To make humanity smarter through search?
Google is a multinational business (Global in every sense of the word) and just like other multinationals, it is all about profits. And the main profits come from their Adwords PPC model. Therefore, they are highly intelligent in making sure that the money cow keeps the milk running.
Think about this for a minute, some optimistic computer science student sets out on preparing a thesis on search engine optimisation; she gives their entire year researching the technology and mathematical equations which major search engine like Google and Bing implement in their ranking algorithms. And this imaginary student puts all her findings on some university server for the world to see “a detailed document about all her findings”. But if you are looking for that detailed information which the student has spent sleepless nights preparing, you will not be able to find it in Google’s first page if you were to search for terms “detailed research for search engine optimisation” and if you disagree with this sample example, then simply search Google and see what happens.
Here’s what you got, first, you get those websites which pay Google for Adwords, then not only that, you may also get some seo spammer show up in that first page. But our detailed research on search engine optimisation is nowhere there to be found. Right? So once again, Google is giving what people want “quick answers” therefore learn to answer questions on your web pages.
How to Avoid Duplicate Content Issues
Google is extremely efficient in crawling new websites and web pages as they are published. Since it has the previous trillions of search query data at its disposal, and since it has a map of internet (70+ Trillion web documents indexed) it can now run calculations on this data set, or draw any conclusion it wants for future updates including algorithm updates.
Each time Google finds a new web document and places it in its index: they attach a timestamp for more efficient future caching. And if you have CMS like WordPress running your website, and if you have different URL’s with the same content (or near same content), then Google sees that as duplicate content because there is nothing unique Per URI (per landing page). Now, to save resources, because just storing and managing a data set of over 70+ trillion web documents is expensive. Why not tell webmasters that to rank high in Google “you should avoid duplicate content”
Smart business decision that works well for everyone including for Google
- First make sure that you present original content (text, images, videos) per URI (basically means per landing page)
- Use rel canonical <link rel=”canonical” href=”https://www.rankya.com/seo-insights-you-didnt-know-about/” /> whenever you can (this tells Google “hey Google, this is the link to original resource”)
- All your internal linking structure should use consistent hyperlinks (1 version of URI pointing back to the same content) for example: if I want to internal link to my home page, and the canonical URL is https://www.rankya.com — Then, I should make sure the CMS I use points to that URI and not to its variations like so: https://www.rankya.com/ https://www.rankya.com/index.php etc.
- Your backlinks from external sources should also be consistent pointing to only one version of your landing pages. Although you can’t really control this because others will share whatever version of URL from your website that they want, however, whenever you have control over external backlinks (perhaps through your own social shares, and share buttons on your landing pages), then make sure that you point the link back to only the canonical URL of your landing pages.
So these are some of the ways you can avoid duplicate content issues, all while keep in mind that you also have 301 redirection directive available to clean up duplicate URL at server level. I thank you for learning with me and I thank you for sharing this post.