The first thing that came into my head when testing the waters to start the process of moving over to Gatsby was my blog post content. If I could get my content in a form a Gatsby site accepts then that's half the battle won right there, the theory being it will simplify the build process.
I opted to go down the local storage route where Gatsby would serve markdown files for my blog post content. Everything else such as the homepage, archive, about and contact pages can be static. I am hoping this isn’t something I will live to regret but I like the idea my content being nicely preserved in source control where I have full ownership without relying on a third-party platform.
My site is currently built on the .NET framework using Kentico CMS. Exporting data is relatively straight-forward, but as I transition to a somewhat content-less managed approach, I need to ensure all fields used within my blog posts are transformed appropriately into the core building blocks of my markdown files.
A markdown file can carry additional field information about my post that can be declared at the start of the file, wrapped by triple dashes at the start and end of the block. This is called frontmatter.
Here is a snippet of one of my blog posts exported to a markdown file:
---
title: "Maldives and Vilamendhoo Island Resort"
summary: "At Vilamendhoo Island Resort you are surrounded by serene beauty wherever you look. Judging by the serendipitous chain of events where the stars aligned, going to the Maldives has been a long time in the coming - I just didn’t know it."
date: "2019-09-21T14:51:37Z"
draft: false
slug: "/Maldives-and-Vilamendhoo-Island-Resort"
disqusId: "b08afeae-a825-446f-b448-8a9cae16f37a"
teaserImage: "/media/Blog/Travel/VilamendhooSunset.jpg"
socialImage: "/media/Blog/Travel/VilamendhooShoreline.jpg"
categories: ["Surinders-Log"]
tags: ["holiday", "maldives"]
---
Writing about my holiday has started to become a bit of a tradition (for those that are worthy of such time and effort!) which seem to start when I went to [Bali last year](/Blog/2018/07/06/My-Time-At-Melia-Bali-Hotel).
I find it's a way to pass the time in airports and flights when making the return journey home. So here's another one...
Everything looks well structured and from the way I have formatted the date, category and tags fields, it will lend itself to be quite accommodating for the needs of future posts. I made the decision to keep the slug value void of any directory structure to give me the flexibility on dynamically creating a URL structure.
Kentico Blog Posts to Markdown Exporter
The quickest way to get the content out was to create a console app to carry out the following:
- Loop through all blog posts in post date descending.
- Update all images paths used as a teaser and within the content.
- Convert rich text into markdown.
- Construct frontmatter key-value fields.
- Output to a text file in the following naming convention: “yyyy-MM-dd---Post-Title.md”.
Tasks 2 and 3 will require the most effort…
When I first started using Kentico, all references to images were made directly via the file path and as I got more familiar with Kentico, this was changed to use permanent URLs. Using permanent URL’s caused the link to an image to change from "/Surinder/media/Surinder/myimage.jpg", to “/getmedia/27b68146-9f25-49c4-aced-ba378f33b4df /myimage.jpg?width=500”. I need to create additional checks to find these URL’s and transform into a new path.
Finding a good .NET markdown converter is imperative. Without this, there is a high chance the rich text content would not be translated to a satisfactorily standard, resulting in some form of manual intervention to carry out corrections. Combing through 250 posts manually isn’t my idea of fun! :-)
I found the ReverseMarkdown .NET library allowed for enough options to deal with Rich Text to Markdown conversion. I could set in the conversion process to ignore HTML that couldn’t be transformed thus preserving content.
Code
using CMS.DataEngine;
using CMS.DocumentEngine;
using CMS.Helpers;
using CMS.MediaLibrary;
using Export.BlogPosts.Models;
using ReverseMarkdown;
using System;
using System.Collections.Generic;
using System.Configuration;
using System.IO;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace Export.BlogPosts
{
class Program
{
public const string SiteName = "SurinderBhomra";
public const string MarkdownFilesOutputPath = @"C:\Temp\BlogPosts\";
public const string NewMediaBaseFolder = "/media";
public const string CloudImageServiceUrl = "https://xxxx.cloudimg.io";
static void Main(string[] args)
{
CMSApplication.Init();
List<BlogPost> blogPosts = GetBlogPosts();
if (blogPosts.Any())
{
foreach (BlogPost bp in blogPosts)
{
bool isMDFileGenerated = CreateMDFile(bp);
Console.WriteLine($"{bp.PostDate:yyyy-MM-dd} - {bp.Title} - {(isMDFileGenerated ? "EXPORTED" : "FAILED")}");
}
Console.ReadLine();
}
}
/// <summary>
/// Retrieve all blog posts from Kentico.
/// </summary>
/// <returns></returns>
private static List<BlogPost> GetBlogPosts()
{
List<BlogPost> posts = new List<BlogPost>();
InfoDataSet<TreeNode> query = DocumentHelper.GetDocuments()
.OnSite(SiteName)
.Types("SurinderBhomra.BlogPost")
.Path("/Blog", PathTypeEnum.Children)
.Culture("en-GB")
.CombineWithDefaultCulture()
.NestingLevel(-1)
.Published()
.OrderBy("BlogPostDate DESC")
.TypedResult;
if (!DataHelper.DataSourceIsEmpty(query))
{
foreach (TreeNode blogPost in query)
{
posts.Add(new BlogPost
{
Guid = blogPost.NodeGUID.ToString(),
Title = blogPost.GetStringValue("BlogPostTitle", string.Empty),
Summary = blogPost.GetStringValue("BlogPostSummary", string.Empty),
Body = RichTextToMarkdown(blogPost.GetStringValue("BlogPostBody", string.Empty)),
PostDate = blogPost.GetDateTimeValue("BlogPostDate", DateTime.MinValue),
Slug = blogPost.NodeAlias,
DisqusId = blogPost.NodeGUID.ToString(),
Categories = blogPost.Categories.DisplayNames.Select(c => c.Value.ToString()).ToList(),
Tags = blogPost.DocumentTags.Replace("\"", string.Empty).Split(',').Select(t => t.Trim(' ')).Where(t => !string.IsNullOrEmpty(t)).ToList(),
SocialImage = GetMediaFilePath(blogPost.GetStringValue("ShareImageUrl", string.Empty)),
TeaserImage = GetMediaFilePath(blogPost.GetStringValue("BlogPostTeaser", string.Empty))
});
}
}
return posts;
}
/// <summary>
/// Creates the markdown content based on Blog Post data.
/// </summary>
/// <param name="bp"></param>
/// <returns></returns>
private static string GenerateMDContent(BlogPost bp)
{
StringBuilder mdBuilder = new StringBuilder();
#region Post Attributes
mdBuilder.Append($"---{Environment.NewLine}");
mdBuilder.Append($"title: \"{bp.Title.Replace("\"", "\\\"")}\"{Environment.NewLine}");
mdBuilder.Append($"summary: \"{HTMLHelper.HTMLDecode(bp.Summary).Replace("\"", "\\\"")}\"{Environment.NewLine}");
mdBuilder.Append($"date: \"{bp.PostDate.ToString("yyyy-MM-ddTHH:mm:ssZ")}\"{Environment.NewLine}");
mdBuilder.Append($"draft: {bp.IsDraft.ToString().ToLower()}{Environment.NewLine}");
mdBuilder.Append($"slug: \"/{bp.Slug}\"{Environment.NewLine}");
mdBuilder.Append($"disqusId: \"{bp.DisqusId}\"{Environment.NewLine}");
mdBuilder.Append($"teaserImage: \"{bp.TeaserImage}\"{Environment.NewLine}");
mdBuilder.Append($"socialImage: \"{bp.SocialImage}\"{Environment.NewLine}");
#region Categories
if (bp.Categories?.Count > 0)
{
CommaDelimitedStringCollection categoriesCommaDelimited = new CommaDelimitedStringCollection();
foreach (string categoryName in bp.Categories)
categoriesCommaDelimited.Add($"\"{categoryName}\"");
mdBuilder.Append($"categories: [{categoriesCommaDelimited.ToString()}]{Environment.NewLine}");
}
#endregion
#region Tags
if (bp.Tags?.Count > 0)
{
CommaDelimitedStringCollection tagsCommaDelimited = new CommaDelimitedStringCollection();
foreach (string tagName in bp.Tags)
tagsCommaDelimited.Add($"\"{tagName}\"");
mdBuilder.Append($"tags: [{tagsCommaDelimited.ToString()}]{Environment.NewLine}");
}
#endregion
mdBuilder.Append($"---{Environment.NewLine}{Environment.NewLine}");
#endregion
// Add blog post body content.
mdBuilder.Append(bp.Body);
return mdBuilder.ToString();
}
/// <summary>
/// Creates files with a .md extension.
/// </summary>
/// <param name="bp"></param>
/// <returns></returns>
private static bool CreateMDFile(BlogPost bp)
{
string markdownContents = GenerateMDContent(bp);
if (string.IsNullOrEmpty(markdownContents))
return false;
string fileName = $"{bp.PostDate:yyyy-MM-dd}---{bp.Slug}.md";
File.WriteAllText($@"{MarkdownFilesOutputPath}{fileName}", markdownContents);
if (File.Exists($@"{MarkdownFilesOutputPath}{fileName}"))
return true;
return false;
}
/// <summary>
/// Gets the full relative path of an file based on its Permanent URL ID.
/// </summary>
/// <param name="filePath"></param>
/// <returns></returns>
private static string GetMediaFilePath(string filePath)
{
if (filePath.Contains("getmedia"))
{
// Get GUID from file path.
Match regexFileMatch = Regex.Match(filePath, @"(\{){0,1}[0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}(\}){0,1}");
if (regexFileMatch.Success)
{
MediaFileInfo mediaFile = MediaFileInfoProvider.GetMediaFileInfo(Guid.Parse(regexFileMatch.Value), SiteName);
if (mediaFile != null)
return $"{NewMediaBaseFolder}/{mediaFile.FilePath}";
}
}
// Return the file path and remove the base file path.
return filePath.Replace("/SurinderBhomra/media/Surinder", NewMediaBaseFolder);
}
/// <summary>
/// Convert parsed rich text value to markdown.
/// </summary>
/// <param name="richText"></param>
/// <returns></returns>
public static string RichTextToMarkdown(string richText)
{
if (!string.IsNullOrEmpty(richText))
{
#region Loop through all images and correct the path
// Clean up tilda's.
richText = richText.Replace("~/", "/");
#region Transform Image Url's Using Width Parameter
Regex regexFileUrlWidth = new Regex(@"\/getmedia\/(\{{0,1}[0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}\}{0,1})\/([\w,\s-]+\.[A-Za-z]{3})(\?width=([0-9]*))", RegexOptions.Multiline | RegexOptions.IgnoreCase);
foreach (Match fileUrl in regexFileUrlWidth.Matches(richText))
{
string width = fileUrl.Groups[4] != null ? fileUrl.Groups[4].Value : string.Empty;
string newMediaUrl = $"{CloudImageServiceUrl}/width/{width}/n/https://www.surinderbhomra.com{GetMediaFilePath(ClearQueryStrings(fileUrl.Value))}";
if (newMediaUrl != string.Empty)
richText = richText.Replace(fileUrl.Value, newMediaUrl);
}
#endregion
#region Transform Generic File Url's
Regex regexGenericFileUrl = new Regex(@"\/getmedia\/(\{{0,1}[0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}\}{0,1})\/([\w,\s-]+\.[A-Za-z]{3})", RegexOptions.Multiline | RegexOptions.IgnoreCase);
foreach (Match fileUrl in regexGenericFileUrl.Matches(richText))
{
// Construct media URL required by image hosting company - CloudImage.
string newMediaUrl = $"{CloudImageServiceUrl}/cdno/n/n/https://www.surinderbhomra.com{GetMediaFilePath(ClearQueryStrings(fileUrl.Value))}";
if (newMediaUrl != string.Empty)
richText = richText.Replace(fileUrl.Value, newMediaUrl);
}
#endregion
#endregion
Config config = new Config
{
UnknownTags = Config.UnknownTagsOption.PassThrough, // Include the unknown tag completely in the result (default as well)
GithubFlavored = true, // generate GitHub flavoured markdown, supported for BR, PRE and table tags
RemoveComments = true, // will ignore all comments
SmartHrefHandling = true // remove markdown output for links where appropriate
};
Converter markdownConverter = new Converter(config);
return markdownConverter.Convert(richText).Replace(@"[!\", @"[!").Replace(@"\]", @"]");
}
return string.Empty;
}
/// <summary>
/// Returns media url without query string values.
/// </summary>
/// <param name="mediaUrl"></param>
/// <returns></returns>
private static string ClearQueryStrings(string mediaUrl)
{
if (mediaUrl == null)
return string.Empty;
if (mediaUrl.Contains("?"))
mediaUrl = mediaUrl.Split('?').ToList()[0];
return mediaUrl.Replace("~", string.Empty);
}
}
}
There is a lot going on here, so let's do a quick breakdown:
GetBlogPosts()
: Get all blog posts from Kentico and parse them to a “BlogPost” class object containing all the fields we want to export.
GetMediaFilePath()
: Take the image path and carry out all the transformation required to change to a new file path. This method is used in GetBlogPosts() and RichTextToMarkdown() methods.
RichTextToMarkdown()
: Takes rich text and goes through a transformation process to relink images in a format that will be accepted by my image hosting provider - Cloud Image. In addition, this is where ReverseMarkdown is used to finally convert to markdown.
CreateMDFile()
: Creates the .md file based on the blog posts found in Kentico.