Affiliate Disclosure
Readers like you help support AWG. If you buy through our links, we may get a commission at no cost to you. Read our affiliate policy.
12 Steps to a Successful SEO Audit

Knowing how to do an SEO audit is the big difference between a great job and a bad SEO analysis job.
The importance of SEO in the development of Marketing and Sales strategies of companies has finally been increasing.
The growing number of opportunities for SEO Managers and Specialists on LinkedIn and job sites, as well as the growing demand for freelance SEO consultants and professionals, is a reflection of this.
When you start working on the organic channel of a company, there are certain analyzes that have to be done so that the SEO professional can know the site and based on this knowledge define the “to-do list” in order of priorities and create the company’s SEO strategy.
Bear in mind that Some SEO tools are also needed.
In this article, I will present a simple but complete way for how to do an SEO audit. Let’s get to it!
Search Engine Crawl
For those who do not know how search engines work, the crawl is the most basic and important function of the web.
Google discovers the websites and pages on the internet by following the available links and the basic function (and the most important) of SEO professionals is to ensure that Google can have access to all the pages of the site that we want indexed.
Given that Google can only read links in HTML format <a href="url"> anchor</a>
, we must ensure that all links are in that format and not in a format that Google does not value, as in Javascript, for example.
How to check if Google can crawl the entire site?
A simple way to analyze whether or not Google can crawl the entire site is:
- Crawl the site with ScreamingFrog – assuming that the URLs discovered by this tool will be the same as Google bots can discover. Make a filter in the “Canonical Url” column so that we are left with only the canonical version of the URLs.
- Go to Google Analytics and fetch traffic from all channels by landing page: Site Content >> Behavior >> Landing Page.
- In Excel make a VLOOKUP formula, in order to identify the pages of the site that have traffic in Analytics, but that do not exist in ScreamingFrog.
With this analysis we were able to identify orphaned pages, that is, pages to which we have traffic from paid campaigns, but which do not have internal links on the site.
Search Engine Indexing
After crawling and identifying all the pages (which are accessible by inlink) of the site, Google will index them and show them in the search results, whenever it considers them relevant to the search made.
An important task in this analysis is to make a comparison between the number of existing pages (pages discovered in the ScreamingFrog crawl) and the number of indexed pages.
If we find that the site has a high number of pages, but only a small percentage of them are being indexed by Google, we should identify the causes and make suggestions for improvement.
How to identify the number of pages indexed in Google?
To identify the number of pages that the site has indexed in Google we must:
- Open Google Search Console.
- Click Pages under Indexing, and then click Indexed State.
We will have access to a table with the number of pages indexed at the moment and below is a graph with the evolution of the same number in the last 90 days.
Sitemap
The sitemap.xml is a file that serves to tell Google all the pages of our site that we want to be indexed.
Given this definition, it is important to realize, first, that the concept of “the pages that we want to be indexed” means, above all, that it should include all the pages that we are not giving a signal to Google not to index.
There are several ways we can signal Google not to index a page – use the Noindex directive, lock the page in the robots.txt file, make a 301 redirect to another page or have the canonical to another URL on the page.
You should be careful not to include these pages in the sitemap so as not to show an inconsistent message to Google.
After preparing the sitemap.xml file with all the pages we want to be indexed, we should test and submit the file in Google Search Console and wait for Google to crawl the site and decide on the pages it will index.
It should be noted that a sitemap has a limit of URLs of 50,000 URLs per file, if the number of URLs of the site exceeds this limit, the ideal is to create several sitemap files per post type, for example, one file for product pages, another for blog articles, another for service pages, among others and aggregate all of those files into a Parent file, which groups all the others together.
Robots.txt
The robots.txt file serves so that we can tell Google, and other search engines, how to crawl and index the pages of our site.
A robots.txt file should have the following format:
User-agent: *
Disallow: / "URI of the pages or folders we want to block access to the robots"
Sitemap: http://www.example.com/sitemap.xml
Once created, the robots.txt file should be tested and submitted to Google Search Console. In this Google guide, we have more detailed information on how to create the file.
HTTP Status Code
The crawl made in ScreamingFrog in the first step serves to identify all the URLs of the site.
Each of these URLs has an associated HTTP Status Code. Technically, a status code is a message sent by the server when the request made by the browser could or could not be made.
When everything is ok with the page, the status code sent to the browser is 200, when the page was not found code 404 is sent, when there is an error with the server is sent the code 500 and when the URL directs to another page the message sent by the server is the HTTP status code 301.
This Wikipedia article gives a detailed explanation of each of these HTTP status codes.
What SEO professionals should keep in mind is that what we want is for a large percentage of our URLs to send 200 code.
If we have 404 or 500, we must identify the causes and propose solutions. If the site has many pages with 301 redirections, we should see what we can do to minimize this situation, in order to have the site optimized to better take advantage of the crawl budget.
PageSpeed and Performance Analysis
Google had already confirmed in the past that PageSpeed was very important on Desktop and has now clarified that it is also already a ranking factor on mobile.
Ignoring this fact is certainly not a good solution. Knowing this, we must try to analyze the performance of the site and identify points of improvement.
PageSpeed Insights, GT Metrix, Pingdom, and Google’s Lighthouse extension are the tools I advise you to use in this review.
As this analysis is done by URL, I suggest that you choose a URL for each type of page – homepage, product page, category page, among others – and that this URL be analyzed in the tools.
In addition to identifying the main technical problems of the site, the tools also suggest points of improvement. This analysis should serve to be the basis of all the technical work of SEO of the site.
It is necessary to keep in mind that the tools suggested above work taking into account the old web protocol.
If the site you are analyzing has support for the new HTTP/2 protocol, some of the suggestions for improvement are no longer necessary, because as HTTP/2 does not make a request for each file, some “good practices” can be ignored.
To check whether or not the site supports HTTP/2 we can use this tool. Just enter the URL and click Test.
We also have a new support HTTP/3, you can test it out with this tool.
Structured data
Structured data is a way for us to give a “structure” to the data we have on the pages of our website. For this, we must use the vocabulary present in the Schema project, in JSON-LD format.
This helps search engines to better understand the contents of the page and increases the possibility of showing rich results on the results page, such as this example when we search for “apple”:

In order to check if the site has implemented some structured data markup and if this implementation is well done, we can use the testing tool developed by Google.
It is necessary to keep in mind that this analysis is done at the page level, so if we intend to analyze the implementation throughout the site, we must do this process for one URL per group of pages.
Google has developed a tool that helps us do the markups per page.
To do this, simply choose a type of page – article, product page, book review and etc – and mark the data.
When the markup is done, you just have to click “CREATE HTML” to get the structured data in HTML format to be implemented in the code of the page.
Content Parity in Mobile and Desktop
The Mobile First Index is here! Many sites have already received notification from Google to be informed that they have been migrated to a mobile-first index.
Today, more than ever, it is important to pay attention not only to the user experience on mobile but also to the experience of Google bots on mobile.
It is crucial that there is a parity of pages and content in the mobile and desktop versions of the site.
What is important to analyze?
In order to ensure consistency in terms of pages and content of our website on mobile and desktop there are two fundamental factors:
- All pages that exist on the desktop must also exist in the mobile version and vice versa.
- The content of these pages must be exactly the same, in both versions, that is, we must ensure that we have the same URLs, titles, descriptions, headings, images, videos and content of the pages.
How to analyze?
To do this analysis we must make two crawls to the site in ScreamingFrog, one using the Desktop User Agent and another using the Smartphone User Agent. After that, we should make export to Excel from both crawls and start analyzing:
- Is the number of URLs the same in both versions?
- Are the URLs identified in the desktop crawl the same as in the mobile version? And vice versa? What are the URLs that are missing from each of the versions?
- In the URLs that have a match in both versions, how are we at the content level? That is, is there parity in the level of titles, descriptions, headings, images, videos and content of the pages?
Duplicate Content
The duplication of content is a very serious problem for the performance of websites in search engines.
It is very common to have duplicate titles, descriptions and headings on the site. We must identify these situations and proceed to correct them.
Identifying duplicate content in Google Search Console
- Open Google Search Console and select the property
- Click Search Appearance
- Click HTML improvements
In this report, you can identify the fields that Google has marked as duplicates.
How to identify duplicate content?
The Google Search Console report depends a lot on how Google crawls your site.
There may be situations, such as those described at the beginning of this text, in which Google has not yet crawled or has not indexed certain URLs, due to this, Google does not identify them as duplicates in Google Search Console.
If we blindly rely on this metric we run the risk of having too many duplicate pages and not realizing it. In order to mitigate this fact, we should seek to identify duplicate content in other tools as well. I’ll show you how it’s done in ScreamingFrog:
- Open the Excel file from the previously made crawl
- Select the title column
- Click Conditional Formatting > Rules for highlighting cells and Duplicate Values
- After that, just make a filter by Color and we have, thus, access to all the duplicate titles.
After that, just make a filter by Color and we have, thus, access to all the duplicate titles.
This analysis should also be done for the descriptions and headings.
Analyze organic traffic
Analyzing the traffic coming from the search engines is very important in the first contact with the site.
We must look at organic traffic as a whole, in order to know the traffic trends, the seasonality of the business and possible Google penalties that the site has suffered. It is important that this analysis is done for different periods of time, in order to facilitate the comparison between them.
We should do a YoY (year over year) comparison for the years in which we have the data available, as well as a WoW (week over week) comparison for the last 12 months.
This way it will be possible to have a general perception of the organic traffic of the site. To perform this analysis we must do the following:
Trends, Seasonality and Penalties
Go to Google Analytics >> Select the “Organic Traffic” segment and do: Acquisition >> All Traffic >> Channels >> Select the Organic Channel. Select the date range for the last 12 months.
Notice the traffic trend and check the traffic trend. In times of fall, realize if the fall is related to the seasonality of the business or if the site was penalized by Google.
A good tip is to have a calendar with the market holidays which we are analyzing highlighted, as well as the dates on which Google released updates. So it will be easy to see if the site fell by seasonality or if it was caught by some Google update.
Top Performing Pages
Go to Google Analytics, select the “Organic Traffic” segment and do Website Content >> Behavior >> Landing Page. Select the date range for the last 12 months.
With this analysis, it is possible to identify the main pages of the site in terms of organic traffic. Having this information as a basis it is possible to prioritize the work of optimizing the pages, for example, we can start by optimizing the pages of the site that are in the Top 20 of organic traffic.
We can also group the URLs by groups of pages (product pages, blog articles, category pages and etc and identify the volume of traffic of each of them).
Cannibalization of Keywords
All sites, some more and some less, have several competitors on Google.
The last thing we want is for our site to compete with itself in search results. This, despite being simple reasoning, is actually a problem that happens often.
Sites are always creating new pages, and this sometimes happens without taking into account the pages that already exist.
Because pages often have very similar content, Google positions them for the same keywords.
In a first analysis, it is important to identify the situations where this happens and propose its correction.
To identify just go to Ahrefs and download all the organic keywords of the site. Then we must open the file in Excel, sort by keywords and highlight the duplicate records and then filter by duplicates.
Once we have identified the keywords that are ranking in Google for more than one page, we must choose the page that we want that keyword to be associated with and correct the other pages on which it is appearing.
There are two common situations that generate problems of cannibalization of keywords, I present these two situations, as well as a suggestion for correction.
Misuse of the keyword
This situation occurs when a website improperly utilizes a particular keyword, leading to keyword cannibalization problems. It involves actions such as excessive keyword repetition or stuffing in various sections of the website, including meta tags, headers, body text, and anchor texts.
Misusing keywords is often done with the intention of manipulating search engine rankings, but it can have adverse effects. When multiple pages within a website compete for the same keyword, it can result in reduced visibility and confusion for search engines.
To avoid keyword misuse and cannibalization, it is crucial to employ strategic and ethical SEO practices. This includes using keywords naturally and sparingly, focusing on creating high-quality and unique content, and ensuring proper keyword distribution across different pages.
Two or more articles on the same topic
This situation arises when a website publishes multiple articles or blog posts that cover the same topic or target the same keyword. While it is essential to provide comprehensive coverage of a subject, having multiple articles with substantial content overlap can lead to cannibalization issues.
When search engines encounter several pages with similar content or identical targeting, they may struggle to determine which page should rank higher in search results. As a result, the visibility and organic traffic of these pages may be negatively impacted.
To address this problem, it is recommended to strategically organize and optimize the content on the website. This can involve consolidating similar articles into a single comprehensive piece, implementing canonical tags to indicate the preferred version of a page, or implementing proper internal linking to guide search engines and users to the most relevant and authoritative page on a given topic.
Analyze backlinks
The importance of backlinks (links from other sites pointing to ours) as a ranking factor is no longer the same as a few years ago, this is indisputable, but they are still a strong ranking factor that no one doubts.
To talk about this topic I would have to write another article, so I’m not going to go in here. If on the one hand having good links helps, and a lot, in the positioning of the site in the search engines, the bad links also influence, but in this case, negatively.
This happens because Google harms sites that have created too many spam links.
In this first analysis of the site what I suggest is to identify in Ahrefs all the backlinks of the site that have a Domain Rating of less than 15 (they are probably spam or low-value links) and make a detailed analysis of each of these URLs. We can use the BULK URL OPENER extension to open multiple URLs at the same time.
What we want here is to analyze by eye the quality of the site that has a link pointing to us.
I have a habit of analyzing if the site has a lot of intrusive advertising, the position of the link on the page (if it is a contextual link or if it appears in a less favourable position, such as in the footer, among other factors that draw attention to spam links) and that these links are disavowed in Google Search Console.
This set of analyses allows you to see how the health of the site is and how prepared it is for SEO challenges.
Based on the results of each of the analyses it is possible to identify points of improvement, assign a priority to each of them and attack to improve the site.
I hope it helped you understand how to do a content audit and content analysis.
Wrapping up
Performing a successful SEO audit is essential for optimizing your website’s visibility and search engine rankings. Keyword misuse and cannibalization can hinder your efforts and negatively impact your online presence.
By following the 12 steps outlined in this blog, you can effectively identify and rectify any SEO issues, ensuring that your website is optimized for search engines.
Whether you choose to perform the audit yourself or enlist the help of professionals like MusePanda for a more Technical Audit, taking action is crucial.
Don’t let keyword cannibalization hold you back—strategically implement SEO best practices and unlock the full potential of your website. Start your successful SEO audit today!

Alpha Web Guide Comment Policy
We welcome relevant and respectful comments. Off-topic or abusive comments will be removed. Please read our Comment Policy before commenting. Typo? Please let us know.