Crawling, Rendering and Indexing

Iky Tai

How do search engines work?
Search engines work through the following functions: Disovery ➜ Crawling ➜ Rendering ➜ Indexing ➜ Ranking.
In this short post, we’ll use plain English (really simple enough) to spell out each step with some tips that you could use to maximize your SEO results.

1. Discovering URLs

Some pages are known because Googlebot has already visited them. Other pages are discovered when Googlebot follows a link from a known page to a new page, or when you submit a sitemap.

TIP: Internal linking is important to help get your pages, especially new pages, get found easily.

2. Visiting or “crawling” the page to find out what’s on it

After discovering a URL, Googlebot queues it for crawling. Googlebot crawls a page by requesting its HTML file but it may not crawl all discovered URLs because some may be disallowed for crawling. If so, Google will skip them and parse the HTML response for other URLs. If links are found, it adds them to the crawl queue. That said, crawling is a process of collecting web pages and noting all the links.

Can I see how a Googlebot crawler sees my pages?
Yes, you can. The cached version of your page reflects a snapshot of the last time Googlebot crawled it.

TIP: Use your robots.txt file to direct Googlebot away from crawling certain pages and sections of your site. Pages that you may want to add to your robots.txt would be /login, /thankyou, /cart, /account etc.

3. Rendering to generate how the page will appear to users

Rendering is an important process to enable Google to see the actual page content that JavaScript (JS) generates. Without rendering, the websites that rely on JS to bring content to the page may have an impact on its ability to rank well as Google might not see that content. If Google can’t see that content, it can’t get indexed on Google search.

In the case of a JS-based website, Googlebot queues pages for rendering unless a robots meta tag or header tells Google not to index the page. Since running JS is very resource-heavy and complex, Google only renders the page an runs JS once its resources allow so this process can be delayed. This involves taking the HTML, JavaScript and CSS information to generate how the page will appear to desktop and/or mobile users. Once again, Googlebot parses the rendered page for links and queues the URLs it finds for crawling.

Can I see how Googlebot renders my JS content?
Yes, you can. With Google’s URL Inspection Tool or Google’s Mobile Friendly Testing Tool, You can see the rendered code (or DOM), which represents the state of your page after rendering.

TIP: It’d be great if you could have as much content as possible in the initial HTML because the website renders quicker. See what Google’s Martin Splitt says below

4. Indexing or not (analyzing content)

Once a page has been crawled and rendered, Google further processes and analyzes the page to determine if it will be stored in the index or not, e.g. the textual content and key content tags and attributes, such as elements and alt attributes, images, videos, and more.

Can I see if Google has indexed my content? Yes, you can.

TIP: Indexing isn’t guaranteed – not every page that Google processed will be indexed. For example, if you content doesn’t seem to be valuable to Google like duplicate content, you may not get indexed. Read your Google Search Console’s Page Indexing Report.

5. Ranking

When a user enters a query, our machines search the index for matching pages and return the results we believe are the highest quality and most relevant to the user’s query.

TIP: Congrats, you finally come to this last step but your ranking game just starts. Check out this recent ranking article to push your articles up on Google search.

FAQs about how search engines work

1. Can Google still index your page if it isn’t fully rendered?

Yes, it can. It can index just the initial HTML which doesn’t contain dynamically injected content. But in the case of a JS-heavy website, Google can’t generally index the content till the website is fully rendered.

2. Does Google skip rendering JS?

Yes, it may. As confirmed by Martin Splitt, Google might decide that a page doesn’t change much after rendering (after running JS), so they won’t render it in the future.

Also, Google’s renderer has timeouts. If it takes too long to render your JS, Google may skip it. But before thinking it as a Google’s thing, check if you blocked your JS files from Googlebot.

3. What happens if Google can’t render the content of your website?

If Google can’t see your content, it may choose another website with content it can index. Things may get even worse when Google cannot see the links to new pages. It can negatively affect Google’s ability to discover new content and may lead to the possibility of many pages not being indexed in Google.