Blog

Categorised by 'General Development'.

  • While manually importing data into a Google Sheet to complete the boring chore of data restructuring, I wondered if there was any way that the initial import might be automated. After all, it would be much more efficient to link directly to an external platform to populate a spreadsheet.

    Google App Scripts provides a UrlFetchApp service giving us the ability to make HTTP POST and GET requests against an API endpoint. The following code demonstrates a simple API request to a HubSpot endpoint that will return values from a Country field by performing a GET request with an authorization header.

    function run() {
      apiFetch();
    }
    
    function apiFetch() {
      // API Endpoint options, including header options.
      var apiOptions = {
         "async": true,
         "crossDomain": true,
         "method" : "GET",
         "headers" : {
           "Authorization" : "Bearer xxx-xxx-xxxxxxx-xxxxxx-xxxxx",
           "cache-control": "no-cache"
         }
       };
    
      // Fetch contents from API endpoint.
      const apiResponse = UrlFetchApp.fetch("https://api.hubapi.com/crm/v3/properties/contact/country?archived=false", apiOptions);
      
      // Parse response as as JSON object.
      const data = JSON.parse(apiResponse.getContentText());
    
      // Populate "Sheet1" with data from API.
      if (data !== null && data.options.length > 0) {
          // Select the sheet.
          const activeSheet = SpreadsheetApp.getActiveSpreadsheet();
          const sheet = activeSheet.getSheetByName("Sheet1");
    
          for (let i = 0; i < data.options.length; i++) {
            const row = data.options[i];
    
            // Add value cells in Google sheet.
            sheet.getRange(i+1, 1).setValue(row.label);
          }
      }
    }
    

    When this script is run, a request is made to the API endpoint to return a JSON response containing a list of countries that will populate the active spreadsheet.

    The Google App Script official documentation provides even more advanced options for configuring the UrlFetchAppservice to ensure you are not limited in how you make your API requests.

    In such little code, we have managed to populate a Google Sheet from an external platform. I can see this being useful in a wide variety of use cases to make a Google Sheet more intelligent and reduce manual data entry.

    In the future, I'd be very interested in trying out some AI-related integrations using the ChatGPT API. If I manage to think of an interesting use case, I'd definitely write a follow-up blog post.

  • I've been delving further into the world of Google App Scripts and finding it my go-to when having to carry out any form of data manipulation. I don't think I've ever needed to develop a custom C# based import tool to handle the sanitisation and restructuring of data ever since learning the Google App Script approach.

    In this post, I will be discussing how to search for a value within a Google Sheet and return all columns within the row the searched value resides. As an example, let's take a few columns from a dataset of ISO-3166 Country and Region codes as provided by this CSV file and place them in a Google Sheet named "Country Data".

    The "Country Data" sheet should have the following structure:

    name alpha-2 alpha-3 country-code
    Australia AU AUS 036
    Austria AT AUT 040
    Azerbaijan AZ AZE 031
    United Kingdom of Great Britain and Northern Ireland GB GBR 826
    United States of America US USA 840

    App Script 1: Returning A Single Row Value

    Our script will be retrieving the two-letter country code by the country name - in this case "Australia". To do this, the following will be carried out:

    1. Perform a search on the "Country Data" sheet using the findAll() function.
    2. The getRow() function will return single row containing all country information.
    3. A combination of getLastColumn() and getRange() functions will output values from the row.
    function run() {
      var twoLetterIsoCode = getCountryTwoLetterIsoCode("Australia"); 
    }
    
    function getCountryTwoLetterIsoCode(countryName) {
      var activeSheet = SpreadsheetApp.getActiveSpreadsheet();
      var countryDataSheet = activeSheet.getSheetByName('Country Data');
    
      // Find text within sheet.
      var textSearch = countryDataSheet.createTextFinder(countryName).findAll();
    
      if (textSearch.length > 0) {
        // Get single row from search result.
        var row = textSearch[0].getRow();    
        // Get the last column so we can use for the row range.
        var rowLastColumn = countryDataSheet.getLastColumn();
        // Get all values for the row.
        var rowValues = countryDataSheet.getRange(row, 1, 1, rowLastColumn).getValues();
    
        return rowValues[0][1]; // Two-letter ISO code from the second column.
      }
      else {
        return "";
      }
    }
    

    When the script is run, the twoLetterIsoCode variable will contain the two-letter ISO code: "AU".

    App Script 2: Returning Multiple Row Matches

    If we had a dataset that contained multiple matches based on a search term, the script from the first example can be modified using the same fundamental functions. In this case, all we need to do is use a for loop and pass all row values to an array.

    The getCountryTwoLetterIsoCode() will look something like this:

    function getCountryTwoLetterIsoCode(countryName) {
      var activeSheet = SpreadsheetApp.getActiveSpreadsheet();
      var countryDataSheet = activeSheet.getSheetByName('Country Data');
    
      // Find text within sheet.
      var textSearch = countryDataSheet.createTextFinder(countryName).findAll();
    
      // Array to store all matched rows.
      var searchRows = [];
    
      if (textSearch.length > 0) {
        // Loop through matches.
        for (var i=0; i < textSearch.length; i++) {
          var row = textSearch[i].getRow();  
          // Get the last column so we can use for the row range.
          var rowLastColumn = countryDataSheet.getLastColumn();
          // Get all values for the row.
          var rowValues = countryDataSheet.getRange(row, 1, 1, rowLastColumn).getValues(); 
    
          searchRows.push(rowValues);
        }
      }
    
      return searchRows;
    }
    

    The searchRows array will contain a collection of matched rows as well as the column data. To carry out a similar output as shown in the first App Script example - the two-letter country code, the function can be called in the following way:

    // Get first match.
    var matchedCountryData = getCountryTwoLetterIsoCode("Australia")[0];
    
    // Get the second column value (alpha-2).
    var twoLetterIsoCode = matchedCountryData[0][1];
    

    Conclusion

    Both examples have demonstrated different ways of returning row values of a search term. The two key lines of code that allows us to do this are:

    // Get the last column so we can use for the row range.
    var rowLastColumn = countryDataSheet.getLastColumn();
    
    // Get all values for the row.
    var rowValues = countryDataSheet.getRange(row, 1, 1, rowLastColumn).getValues();
    
  • Whenever there is a need to restructure an Excel spreadsheet to an acceptable form to be used for a SaaS platform or custom application, my first inclination is to build something in C# to get the spreadsheet into a form I require.

    This week I felt adventurous and decided to break the mundane job of formatting a spreadsheet using an approach I've been reading up on for some time but just never got a chance to apply in a real-world scenario - Google App Scripts.

    What Is A Google App Script?

    Released in 2009, Google App Scripts is a cloud-based platform that allows you to automate tasks across Google Workspace products such as Drive, Docs, Sheets, Calendar, Gmail, etc. You could think of App Scripts as similar to writing a macro in Microsoft Office. They both can automate repeatable tasks and extend the standard features of the application.

    The great thing about Google App Script development is being able to use popular web languages (HTML/CSS/JavaScript) to build something custom. Refreshing when compared to the more archaic option of using VBA in Microsoft Office.

    Some really impressive things can be achieved using App Scripts within the Google ecosystem.

    Google Sheets App Script

    The Google App Script I wrote fulfils the job of taking the contents of cells in a row from one spreadsheet to be copied into another. The aim is to carry out automated field mapping, where the script would iterate through each row from the source spreadsheet and create a new row in the target spreadsheet where the cell value would be placed in a different column.

    This example will demonstrate a very simple approach where the source spreadsheet will contain five columns where each row contains numbers in ascending order to then be copied to the target spreadsheet in descending order.

    Before we add the script, we need to create two spreadsheets:

    • Source sheet: Source - Numbers Ascending
    • Target sheet: Destination - Numbers Descending

    The source sheet should mirror the same structure as the screenshot (below) illustrates.

    Google Sheet - Source

    The target sheet just needs to contain the column headers.

    The App Script can be created by:

    1. Navigating to Extensions > App Scripts from the toolbar. This will open a new tab presenting an interface to manage our scripts.
    2. In the "Files" area, press the "+" and select "Script".
    3. Name the script file: "export-cells-demo.gs".

    Add the following code:

    // Initialiser.
    function run() {
      sendDataToDestinationSpreadSheet();
    }
    
    // Copies values from a source spreadsheet to the target spreadsheet.
    function sendDataToDestinationSpreadSheet() {
      var activeSheet = SpreadsheetApp.getActiveSpreadsheet();
    
      // Get source spreadsheet by its name.
      var sourceSheet = activeSheet.getSheetByName('Source - Numbers Ascending');
    
      // Select the source spreadsheet cells.
      var sourceColumnRange = sourceSheet.getRange('A:E');
      var sourceColumnValues = sourceColumnRange.getValues();
    
      // Get target spreadsheet by its name..
      var targetSheet = activeSheet.getSheetByName('Destination - Numbers Descending');
    
      // Iterate through all rows from the source sheet.
      // Start index at 1 to ignore the column header.
      for(var i = 1; i < sourceColumnValues.length; i++) {
        // Get the cell value for the row.
        var column1 = sourceColumnValues[0,i][0];
        var column2 = sourceColumnValues[0,i][1];
        var column3 = sourceColumnValues[0,i][2];
        var column4 = sourceColumnValues[0,i][3];
        var column5 = sourceColumnValues[0,i][4];
        
        // Use getRange() to get the value position by declaring the row and column number.
        // Use setValue() to copy the value into target spreadsheet column.
        targetSheet.getRange(i+1, 1).setValue(column5);
        targetSheet.getRange(i+1, 2).setValue(column4);
        targetSheet.getRange(i+1, 3).setValue(column3);
        targetSheet.getRange(i+1, 4).setValue(column2);
        targetSheet.getRange(i+1, 5).setValue(column1);
      }
    }
    

    The majority of this script should be self-explanatory with the aid of comments. The only part that requires further explanation is where the values in the target sheet are set, as this is where we insert the numbers for each row in descending order:

    ...
    ...
    targetSheet.getRange(i+1, 1).setValue(column5);
    targetSheet.getRange(i+1, 2).setValue(column4);
    targetSheet.getRange(i+1, 3).setValue(column3);
    targetSheet.getRange(i+1, 4).setValue(column2);
    targetSheet.getRange(i+1, 5).setValue(column1);
    ...
    ...
    

    The getRange function accepts two parameters: Row Number and Column Number. In this case, the row number is acquired from the for loop index as we're using the same row position in both source and target sheets. However, we want to change the position of the columns in order to display numbers in descending order. To do this, I set the first column in the target sheet to contain the value of the last column from the source sheet and carried on from there.

    All the needs to be done now is to run the script by selecting our "run()" function from the App Scripts toolbar and pressing the "Run" button.

    The target spreadsheet should now contain the numbered values for each row in descending order.

    Google Sheet - Target

    Voila! You've just created your first Google App Script in Google Sheets with simple field mapping.

    Conclusion

    Creating my first Google App Script in a real-world scenario to carry out some data manipulation has opened my eyes to the possibilities of what can be achieved without investing additional time developing something like a Console App to do the very same thing.

    There is a slight learning curve involved to understand the key functions required to carry out certain tasks, but this is easily resolved with a bit of Googling and reading through the documentation.

    My journey into Google App Scripts has only just begun and I look forward to seeing what else it has to offer!

  • When building React applications, my components would consist of a mixture of React Function and Class components. As much as I love to use React Function components due to their simple, lightweight and concise code construction, I had a misconception they were limited in features.

    React Function components are generally classed as "stateless" where I would go as far as to send props (if required) and return the rendered output in HTML. So whenever there was a need for some form of interaction where state was required, a Class component would be used.

    It was only until recently I was made aware of React Hooks that allowed React Function components to have state. Let's take a look at a simple example where we are increasing and decreasing a number.

    import React, { useState } from 'react';
    
    const CounterApp = () => {
      const [state, setState] = useState({
        count: 0
      });  
    
      const incrementCount = () => {
        setState({
          count: state.count + 1,
        });
      }
    
      const decrementCount = () => {
        setState({
          count: state.count - 1,
        });
      }
    
      return (
        <div>
          <h1>{state.count}</h1>
    
          <button onClick={incrementCount}>Increment</button>
          <button onClick={decrementCount}>Decrement</button>
        </div>
      );
    };
    
    export default CounterApp;
    

    In the code above we have accomplished two things:

    1. Storing our counter in state.
    2. Click handlers

    The useState() hook declaration accepts properties in the way we are accustomed to when used in a Class component. The similarity doesn't stop there, you'll also notice the same approach when setting and getting a state property.

    Multiple states can also be declared and through the array destructuring syntax lets us give different names to the state variables we declared by calling useState. Really cool!

    const [age, setAge] = useState(42);
    const [fruit, setFruit] = useState('banana');
    const [todos, setTodos] = useState([{ text: 'Learn Hooks' }]);
    

    Even though this post has concentrated on state, I've found React Function components offer similar features as a Class component. This is how I will be building my components going forward.

  • I’ll be the first to admit that I very rarely (if at all!) assign a nice pretty share image to any post that gets shared on social networks. Maybe it’s because I hardly post what I write to social media in the first place! :-) Nevertheless, this isn’t the right attitude. If I am really going to do this, then the whole process needs to be quick and render a share image that sets the tone before that will hopefully entice a potential reader to click on my post.

    I started delving into how my favourite developer site, dev.to, manages to create these really simple text-based share images dynamically. They have a pretty good setup as they’ve somehow managed to generate a share image that contains relevant post related information perfectly, such as:

    • Post title
    • Date
    • Author
    • Related Tech Stack Icons

    For those who are nosey as I and want to know how dev.to undertakes such functionality, they have kindly written the following post - How dev.to dynamically generates social images.

    Since my website is built using the Gatsby framework, I prefer to use a local process to dynamically generate a social image without the need to rely on another third-party service. What's the point in using a third-party service to do everything for you when it’s more fun to build something yourself!

    I had envisaged implementing a process that will allow me to pass in the URL of my blog posts to a script, which in turn will render a social image containing basic information about a blog post.

    Intro Into Puppeteer

    Whilst doing some Googling, one tool kept cropping up in different forms and uses - Puppeteer. Puppeteer is a Node.js library maintained by Google Chrome’s development team and enables us to control any Chrome Dev-Tools based browser through scripts. These scripts can programmatically execute a variety of actions that you would generally do in a browser.

    To give you a bit of an insight into the actions Puppeteer can carry out, check out this Github repo. Here you can see Puppeteer is a tool for testing, scraping and automating tasks on web pages. It’s a very useful tool. The only part I spent most of my time understanding was its webpage screenshot feature.

    To use Puppeteer, you will first need to install the library package in which two options are available:

    • Puppeteer Core
    • Puppeteer

    Puppeteer Core is the more lighter-weight package that can interact with any Dev-Tool based browser you already have installed.

    npm install puppeteer-core
    

    You then have the full package that also installs the most recent version of Chromium within the node_modules directory of your project.

    npm install puppeteer
    

    I opted for the full package just to ensure I have the most compatible version of Chromium for running Puppeteer.

    Puppeteer Webpage Screenshot Script

    Now that we have Puppeteer installed, I wrote a script and added it to the root of my Gatsby site. The script carries out the following:

    • Accepts a single argument containing the URL of a webpage. This will be the page containing information about my blog post in a share format - all will become clear in the next section.
    • Approximately screenshot a cropped version of the webpage. In this case 840px x 420px - the exact size of my share image.
    • Use the page name in the URL as the image file name.
    • Store the screenshot in my "Social Share” media directory.
    const puppeteer = require('puppeteer');
    
    // If an argument is not provided containing a website URL, end the task.
    if (process.argv.length !== 3) {
      console.log("Please provide a single argument containing a website URL.");
      return;
    }
    
    const pageUrl = process.argv[2];
    
    const options = {
        path: `./static/media/Blog/Social Share/${pageUrl.substring(pageUrl.lastIndexOf('/') + 1)}.jpg`,
        fullPage: false,
        clip: {
          x: 0,
          y: 0,
          width: 840,
          height: 420
        }
      };
      
      (async () => {
        const browser = await puppeteer.launch({headless: false});
        const page = await browser.newPage()
        await page.setViewport({ width: 1280, height: 800, deviceScaleFactor: 1.5 })
        await page.goto(pageUrl)
        await page.screenshot(options)
        await browser.close()
      })(); 
    

    The script can be run as so:

    node puppeteer-screenshot.js http://localhost:8000/socialcard/Blog/2020/07/25/Using-Instagram-API-To-Output-Profile-Photos-In-ASPNET-2020-Edition
    

    I made an addition to my Gatsby project that generated a social share page for every blog post where the URL path was prefixed with /socialcard. These share pages will only be generated when in development mode.

    Social Share Page

    Now that we have our Puppeteer script, all that needs to be accomplished is to create a nice looking visual for Puppeteer to convert into an image. I wanted some form of automation where blog post information was automatically populated.

    I’m starting off with a very simple layout taking some inspiration from dev.to and outputting the following information:

    • Title
    • Date
    • Tags
    • Read time

    Working with HTML and CSS isn’t exactly my forte. Luckily for me, I just needed to do enough to make the share image look presentable.

    Social Card Page

    You can view the HTML and CSS on JSFiddle. Feel free to update and make it better! If you do make any improvements, update the JSFiddle and let me know!

    Next Steps

    I plan on adding some additional functionality allowing a blog post teaser image (if one is added) to be used as a background and make things look a little more interesting. At the moment the share image is very plain. As you can tell, I keep things really simple as design isn’t my strongest area. :-)

    If all goes to plan, when I share this post to Twitter you should see my newly generated share image.

  • I have been garnering interest in a static-site generator architecture ever since I read Paul Stamatiou’s enlightening post about how he built his website. I am always intrigued to know what goes on behind the scenes of someone's website, especially bloggers and the technology stack they use.

    Paul built his website using Jekyll. In his post, he explains his reasoning to why he decided to go down this particular avenue - with great surprise resonated with me. In the past, I always felt the static-site generator architecture was too restrictive and coming from a .NET background, I felt comfortable knowing my website was built using some form of server-side code connected to a database, allowing me infinite possibilities. Building a static site just seemed like a backwards approach to me. Paul’s opening few paragraphs changed my perception:

    ..having my website use a static site generator for a few reasons...I did not like dealing with a dynamic website that relied on a typical LAMP stack. Having a database meant that MySQL database backups was mission critical.. and testing them too. Losing an entire blog because of a corrupt database is no fun...

    ...I plan to keep my site online for decades to come. Keeping my articles in static files makes that easy. And if I ever want to move to another static site generator, porting the files over to another templating system won't be as much of a headache as dealing with a database migration.

    And then it hit me. It all made perfect sense!

    Enter The Static Site Generator Platform

    I’ll admit, I’ve come late to the static site party and never gave it enough thought, so I decided to pick up the slack and researched different static-site generator frameworks, including:

    • Jekyll
    • Hugo
    • Gatsby

    Jekyll runs on the Ruby language, Hugo on Go (invented by Google) and Gatsby on React. After some tinkering with each, I opted to invest my time in learning Gatsby. I was very tempted by Hugo, (even if it meant learning Go) as it is more stable and requires less build time which is important to consider for larger websites, but it fundamentally lacks an extensive plugin ecosystem.

    Static Generator of Choice: Gatsby

    Gatsby comes across as a mature platform offering a wide variety of useful plugins and tools to enhance the application build. I’m already familiar coding in React from when I did some React Native work in the past, which I haven’t had much chance to use again. Being built on React, it gave me an opportunity to dust the cobwebs off and improve both my React and (in the process) JavaScript skillset.


    I was surprised by just how quickly I managed to get up and running. There is nothing you have to configure unlike when working with content-management platforms. In fact, I decided to create a Gatsby version of this very site. Within a matter of days, I was able to replicate the following website functionality:

    • Listing blog posts.
    • Pagination.
    • Filtering by category and tag.
    • SEO - managing page titles, description, open-graph tags, etc.

    There I such a wealth of information and support online to help you along.

    I am very tempted to move over to Gatsby.

    When to use Static or Dynamic?

    Static site generators isn’t a framework that is suited for all web application scenarios. It’s more suited for small/medium-sized sites where there isn't a requirement for complex integrations. It works best with static content that doesn’t require changes to occur based on user interaction.

    The only thing that comes into question is the build time where you have pages of content in their thousands. Take Gatsby, for example...

    I read one site containing around 6000 posts, resulting in a build time of 3 minutes. The build time can vary based on the environment Gatsby is running on and build quality. I personally try to ensure best case build time by:

    • Sufficiently spec'd hardware is used - laptop and hosting environment.
    • Keeping the application lean by utilising minimal plugins.
    • Write efficient JavaScript.
    • Reusing similar GraphQL queries where the same data is being requested more than once in different components, pages and views.

    We have to accept the more pages a website has, the slower the build time will be. Hugo should get an honourable mention here as the build speed beats its competition hands down.

    Static sites have their place in any project as long as you conform within the confines of the framework. If you have a feeling that your next project will at some point (or immediately) require some form of fanciful integration, dynamic is the way to go. Dynamic gives you unlimited possibilities and will always be the safer option, something static will never measure against.

    The main strengths of static sites are that they’re secure and perform well in Lighthouse scoring potentially resulting favourably in search engines.

    Avenue’s for Adding Content

    The very cool thing is you have the ability to hook up to your content via two options:

    1. Markdown files
    2. Headless CMS

    Markdown is such a pleasant and efficient way to write content. It’s all just plain text written with the help of a simplified notation that is then transformed into HTML. The crucial benefit of writing in markdown is its portability and clean output. If in the future I choose to jump to a different static framework, it’s just a copy and paste job.

    A more acceptable client solution is to integrate with a Headless CMS where a more familiar Rich Text content editing and the storage of media is available to hand.

    You can also create custom-built pages without having to worry about the data layer, for example, landing pages.

    Final Thoughts

    I love Gatsby and it’s been a very long time since I have been excited by a different approach to developing websites. I am very tempted to make the move as this framework is made for sites like mine, providing I can get solutions to areas in Gatsby where I currently lack knowledge, such as:

    • Making URL’s case-insensitive.
    • 301 redirects.
    • Serving different responsive images within the post content. I understand Gatsby does this at templating-level but cannot currently see a suitable approach for media housed inside content.

    I’m sure the above points are achievable and as I have made quite swift progress on replicating my site in Gatsby, if all goes to plan, I could go the full hog. Meaning I don’t plan on serving content from any form of content-management system and cementing myself in Gatsby.

    At one point I was planning on moving over to a headless CMS, such as Kontent or Prismic. That plan was swiftly scrapped when there didn’t seem to be an avenue of migrating my existing content unless a Business or Professional plan is purchased, which came to a high cost.

    I will be documenting my progress in follow up posts. So watch this space!

  • I should start off by saying how much I love TortoiseGit and it has always been the reliable source control medium, even though it's a bit of a nightmare to set up initially to work alongside Bitbucket. But due to a new development environment for an external project, I am kinda forced to use preinstalled Git programs:

    • SourceTree
    • Git Bash

    I am more inclined to use a GUI when interacting with my repositories and use the command line when necessary.

    One thing that has been missing from SourceTree ever since it was released, is the ability to export changes over multiple commits. I was hoping after many years this feature would be incorporated. Alas, no. After Googling around, I came across a StackOverflow post that showed the only way to export changes in Sourcetree based on multiple commits is by using a combination of the git archive and git diff commands:

    git archive --output=archived_changes.zip HEAD $(git diff --diff-filter=ACMRTUXB --name-only hash1 hash2)
    

    This can be run directly using the Terminal window for a repository in Sourcetree. The "hash1" and "hash2" values are the long 40 character length commit ID's.

    The StackOverflow post has helped me in what I needed to achieve and as a learning process, I want to take things a step further in my post to understand what the archive command is actually doing for my own learning. So let's dissect the command into manageable chunks.

    Part 1

    git archive --output=archived_changes.zip HEAD
    

    This creates the archive of the whole repository into a zip file. We can take things further in the next section to select the commits we need.

    Part 2

    git diff --diff-filter=ACMRTUXB
    

    The git diff command shows changes in between commits. The filter option gives us more flexibility to select the files that are:

    • A Added
    • C Copied
    • D Deleted
    • M Modified
    • R Renamed
    • T have their type (mode) changed
    • U Unmerged
    • X Unknown
    • B have had their pairing Broken

    Part 3

    --name-only hash1 hash2
    

    The second part of the git diff command uses the "name-only" option that just shows which files have changed over multiple commits based on the hash values entered.

    Part 4

    The git diff command needs to be wrapped around parentheses to act as a parameter for the git archive command.

  • Being a web developer I am trying to become savvier when it comes to factoring additional SEO practices, which is generally considered (in my view) compulsory.

    Ever since Google updated its Search Console (formally known as Webmaster Tools), it has opened my eyes to how my site is performing in greater detail, especially the pages Google deems as links not worthy for indexing. I started becoming more aware of this last August, when I wrote a post about attempting to reduce the number "Crawled - Currently not indexed" pages of my site. Through trial and error managed to find a way to reduce the excluded number of page links.

    The area I have now become fixated on is the sheer number of pages being classed as "Duplicate without user-selected canonical". Google describes these pages as:

    This page has duplicates, none of which is marked canonical. We think this page is not the canonical one. You should explicitly mark the canonical for this page. Inspecting this URL should show the Google-selected canonical URL.

    In simplistic terms, Google has detected there are pages that can be accessed by different URL's with either same or similar content. In my case, this is the result of many years of unintentional neglect whilst migrating my site through different platforms and URL structures during the infancy of my online presence.

    Google Search Console has marked around 240 links as duplicates due to the following two reasons:

    1. Pages can be accessed with or without a ".aspx" extension.
    2. Paginated content.

    I was surprised to see paginated content was classed as duplicate content, as I was always under the impression that this would never be the case. After all, the listed content is different and I have ensured that the page titles are different for when content is filtered by either category or tag. However, if a site consists of duplicate or similar content, it is considered a negative in the eyes of a search engine.

    Two weeks ago I added canonical tagging across my site, as I was intrigued to see if there would be any considerable change towards how Google crawls my site. Would it make my site easier to crawl and aid Google in understanding the page structure?

    Surprising Outcome

    I think I was quite naive about how my Search Console Coverage statistics would shift post cononicalisation. I was just expecting the number of pages classed as "Duplicate without user-selected canonical" to decrease, which was the case. I wasn't expecting anything more. On further investigation, it was interesting to see an overall positive change across all other coverage areas.

    Here's the full breakdown:

    • Duplicate without user-selected canonical: Reduced by 10 pages
    • Crawled - Currently not indexed: Reduced by 65 pages
    • Crawl anomaly: Reduced by 20 pages
    • Valid : Increased by 60 pages

    The change in figures may not look that impressive, but we have to remember this report is based only on two weeks after implementing canonical tags. All positives so far and I'm expecting to see further improvements over the coming weeks.

    Conclusion

    Canonical markup can often be overlooked, both in its implementation and importance when it comes to SEO. After all, I still see sites that don't use them as the emphasis is placed on other areas that require more effort to ensure it meets Google's search criteria, such as building for mobile, structured data and performance. So it's understandable why canonical tags could be missed.

    If you are in a similar position to me, where you are adding canonical markup to an existing site, it's really important to spend the time to set the original source page URL correctly the first time as the incorrect implementation can lead to issues.

    Even though my Search Console stats have improved, the jury's still out to whether this translates to better site visibility across search engines. But anything that helps search engines and visitors understand your content source can only be beneficial.

  • My day to day version control system is Bitbucket. I never got on with their own Git GUI offering - Sourcetree. I always found using TortoiseGit much more intuitive to use and a flexible way to interact with my git repository. If anyone can change my opinion on this, I am all ears!

    I work with large projects that are around a couple hundred megabytes in size and if I were to clone the same project over different branches, it can use up quite a bit of hard disk space. I like to quickly switch to my master branch after carrying out a merge for testing before carrying out a release.

    Luckily TortoiseGit makes switching branches a cinch in just a few clicks:

    • Right-click in your repository
    • Go to TortoiseGit context menu
    • Click Switch/Checkout
    • Select the branch you wish to switch to and select "Overwrite working tree changes (force)"

    TortoiseGit Switch Branches

    Selecting the "Overwrite working tree changes (force)" tick box is important to ensure all files in your working directory is overwritten with the files directly from your branch. We do not want remnants of files left from the branch we had previously switched from kicking around.

  • Every few weeks, I check over the health of my site through Google Search Console (aka Webmaster Tools) and Analytics to see how Google is indexing my site and look into potential issues that could affect the click-through rate.

    Over the years the content of my site has grown steadily and as it stands it consists of 250 published blog posts. When you take into consideration other potential pages Google indexes - consisting of filter URL's based on grouping posts by tag or category, the number of links that my site consists is increased considerably. It's to the discretion of Google's search algorithm to whether it includes these links for indexing.

    Last month, I decided to scrutinise the Search Console Index Coverage report in great detail just to see if there are any improvements I can make to alleviate some minor issues. What I wasn't expecting to see is the large volume of links marked as "Crawled - Currently not indexed".

    Crawled Currently Not Indexed - 225 Pages

    Wow! 225 affected pages! What does "Crawled - Currently not indexed" mean? According to Google:

    The page was crawled by Google, but not indexed. It may or may not be indexed in the future; no need to resubmit this URL for crawling.

    Pretty self-explanatory but not much guidance on the process on how to lessen the number of links that aren't indexed. From my experience, the best place to start is to look at the list of links that are being excluded and to form a judgement based on the page content of these links. Unfortunately, there isn't an exact science. It's a process of trial and error.

    Let's take a look at the links from my own 225 excluded pages:

    Crawled Currently Not Indexed - Non Indexed Links

    On initial look, I could see that the majority of the URL's consisted of links where users can filter posts by either category or tag. I could see nothing content-wise when inspecting these pages for a conclusive reason for index exclusion. However, what I did notice is that these links were automatically found by Google when the site gets spidered. The sitemap I submitted in the Search Console only list out blog posts and content pages.

    This led me to believe a possible solution would be to create a separate sitemap that consisted purely of links for these categories and tags. I called it metasitemap.xml. Whenever I added a post, the sitemap's "lastmod" date would get updated, just like the pages listed in the default sitemap.

    I created and submitted this new sitemap around mid-July and it wasn't until four days ago the improvement was reported from within the Search Console. The number of non-indexed pages was reduced to 58. That's a 74% reduction!

    Crawled Currently Not Indexed - 58 Pages

    Conclusion

    As I stated above, there isn't an exact science for reducing the number of non-indexed pages as every site is different. Supplementing my site with an additional sitemap just happened to alleviate my issue. But that is not to say copying this approach won't help you. Just ensure you look into the list of excluded links for any patterns.

    I still have some work to do and the next thing on my list is to implement canonical tags in all my pages since I have become aware I have duplicate content on different URL's - remnants to when I moved blogging platform.

    If anyone has any other suggestions or solutions that worked for them, please leave a comment.