As a new starter at OneHydra, I am learning the basics of Search Engine Optimisation (SEO). SEO is often described as complicated, so each week I will share my understanding of the beast that it is to help anyone else who is also new to search and digital marketing. In the first part of the SEO 101 series (Or SEO101Hydra as we like to call it!) I look at how search engines crawl, index and rank content.
The 7 stage process
The phrase “Google it” when someone doesn’t know the answer to a question or wants to research something, has now become common-place in the English language. In fact it’s hard to remember life before we had access to unlimited, free information! But how do search engines (such as Google) choose which website ranks where in the search results we see? This post will explain how search engines crawl and index the web using a 7 stage process, and how this effects what we, the user sees:
- Crawling – Spiders or bots following links on pages.
- Caching – Capturing an image or snapshot of the page.
- Indexing – Storing the important information on the page.
- Searching – The user typing a search query into the search engine.
- Retrieving – Finding the best matches from the search engines database.
- Ranking – Using a complicated algorithm and filters to determine a rank for each match.
- Results – The search engine results page (SERP) as we see it.
The first part of any search engine’s process is to crawl through the web. Imagine a robotic spider or bot, which simply follows all of the links it can find on a page to make up a ‘web’ such as the diagram below:
All search engines use bots or spiders to navigate around the web, and use internal pages and site maps to help with this process. Google uses a bot adequately named “Google bot” (Googlebot-mobile for mobile sites). With Google having around 68% of the market share globally they are by far the biggest player in search. However Yandex is the biggest search engine in Russia (their bot is simply named Yandex), Baidu is the largest in China (Baidu) and Bing also has a considerable chunk of market share in the US with around 19% (Bingbot).
Now that these bots have found all of the links on a page, the search engine now goes through the process of caching. Caching is where search engines take a copy of a page; in essence, an image or snapshot in time. TIP You can look at when any page was last cached by simply typing the website into Google and right clicking on the green arrow next to the URL in the search results and clicking “cached”.
Still with me? Okay, so the next step is indexing. The important information on the page is understood and stored by the search engine. It builds out a database of everything on that page; imagine an excel spreadsheet counting the words on the page, excluding the less important words such as “is” and “there” etc. There are 2 indexes, the main index and the supplemental index. Important pages are stored in the main index, with lower ranked pages (such as duplicates) being stored in the supplemental index.
Searching & Retrieving
So now comes the part we all know and love – searching! The search engine retrieves every piece of information they have in their index about our ‘search query’ and retrieves the best matches from their database.
So how does a search engine decide where to rank a website? With all of that information in their index, why does a certain page get the privilege of ranking first? It’s fair to say it’s complicated. Google uses a long and complex algorithm with over 200 elements which can be seen below:
As well as Google’s algorithm there are also filters which help to determine ranking. QDF (Query deserves freshness) is one such filter which favours fresh content. QDD (query deserves diversity) is another which ensures there is a diversity of results on a SERP (type Hummingbird into Google as an example to see the range of results available).
So the final stage is for the search engine to show the user a search engine results page (SERP), where we can see our ranked results from our search query, along with paid advertisements, images, shopping, videos etc. To put this into perspective, this whole process was completed in less than a second. Clever eh!