Duplicated content and SEO
What is duplicated content in the web?
It is basically having two or more identical or semi identical contents as a result of one or more urls that can be accessed by search engine spiders either internal or external to a domain name. This is the best definition of duplicated content in relation to SEO that I could think of while there is no universally acceptable definition for such a term.
How can content be duplicated?
There are lots of ways contents can be duplicated and it is in most case the result of using cms or creating a copy for example print etc. But also content can be replicated using scraping tools from web feeds and other means of web services.
Google in every update introduce a tighter control on web duplication in order to provide the user with reliable content hence shutting the doors on dynamically created contents across domain names. For example having amazon affiliates on a website without organic content is the recipe for disaster from an SEO point of view.
Seo duplicated content penalty
Most search engines have identified best way to get around genuine duplication but it doesn't always guarantee best result the web master is intended to achieve. For example using aliases and canonical names can result in such duplication, for example /url/contentname or url?name=contentname they both can point to the same url. It is more likely here that google has identified that this content is genuine and it is very common in websites.
All content management systems don't handle url correctly for seo. For example using clean url in drupal can cause two pages the clean url and the unclean url to share the same url while I found wordpress to be a bit restrictive over the categories of the content which makes it less valuable from the CMS point of view however worpress content duplication can appear but I must admit that worpress is much better for SEO out of the box.
Despite the fact that google announced that there is no penalty will be imposed on genuine content duplication, still there is a huge penalty in seo. To illustrate further, if google indexed two contents of the same then google will choose one to be the main while the second will be leaking juice in competition with the first version despite the fact the second piece won't be shown in SERP or to that matter it won't be adding weight to the site.
How to avoid it
The next article I am going to write is about drupal and seo. I will be explaining the point I found in drupal from an seo point of view. How do you avoid creating duplicated content is a very difficult question to answer. I would suggest keeping an eye on tools such as the web master tools or similar other search engine tools to find out the duplication of the content. Or you can check online with a trusted sites that have checking tools. But this is not all. You can also use cloning and link tracking tools.
Web masters have to understand how their software work in order to avoid the creation of identical piece of work as far as SEO is concerned. To do that a web master need to understand the following areas:
1- How urls work and what is the difference between dynamically generate page and a static page
2- Avoid using cms if there is no users to be interacted with the site. Static page are the best for SEO
3- Understanding www and non www
4- Understand cpanel and virtual hosting with cpanel
5- for dynamic url understanding parameters and how they effect the generation of pages is essential. For example sort and order parameters can create duplicate contents in lots of ways.
6- Comments and comments replies can also generate lots of semi duplicated contents
The best way to avoid duplicate content is to create a static web page using normal hard coding technique or symfony however, this is not always possible especially if the project requires Constant user input then using a cms or a framework is necessarily. If a webmaster is going to use cms then a combinations of actions are required ranging as follows:
1- Explicitly creating a conical name for all the pages that share the same content, this can be achieved using global redirection module in drupal for example
using (meta name="robots" content="noindex, nofollow") or (link rel="canonical" ) but these are complicated manual if you are going to interact with cms's templates engines!.
2- Using 301 redirection advised by google, this is also included in the global redirection module mentioned above.
3- Optionally specify the parameters in the search engine tools.
4- Watch out if using taxonomies because they can potentially create duplication, however using taxonomies is essential in most cases but always use the point 1 and 2 to achieve good redirection. for example having one category called cool and the home page is set to display the latest cool contents while displaying the same content in the cool category is duplicated work. Try to use custom front page that doesn't display the latest content in this case.