1 Since my last blog post I've decided that I was not happy with my RSS feed. So I've created a bash script to generate full RSS feeds for my full blog posts and not just the link and title of my posts. How I've made my script work is by taking the directory containing webpage for my blog and parsing the .html files within for the required information. My RSS feed now contains the full contents of my blog posts.
2 I've created an easily-readable 'example script' based on the one that runs on my website. It's the same script as the one that I run on my website except it's more cleaned up and doesn't have my own domain and directory path information in it. This example script is the one that is featured in the picture at the bottom of this blog post.
3 Here is a link to a copy of the script. If you want to use is make sure to change www.example.com
with your domain name and /blog
for posts_directory
with your path to the directory containing the webpages you want to implemented in your RSS feed. Run the script in the terminal and it should generate a file called blog.xml
. Move that file to the root of your website directory (or change the script to generate the file in your website directory) and your RSS feed is done.
4 The script generates most of the RSS feed by grabbing the text between certain html tags with 'awk' and 'sed' for each html file and plugging them into a xml file with 'echo'. The title
for each item in the .xml file is generated from taking what's between the h1
tags of the associated html file in the directory. Pubdate
for each item is generated by taking what's between the time
tags of the associated html file in the directory (dates must be in the YYYY-MM-DD
format in the html documents). Description
is generated from taking from what's between the article
tags of the associated html file in the directory.
5 Some information for script generation come from the file names of the html documents in the directory or can be generated without referring to the contents inside the html documents. The information for the link
tag for each item in the .xml file is generated from taking the associated html file name in the directory and adding a domain name and directory path. The information in the header of the .xml file obviously doesn't need to be generated from another file.
6 Edit: Credit to rvense from hacker news that suggested that my bash scripts should use "bash strict mode". I've updated my script to include that on his advise. I have also added language information and information for atom feed support in the header of the xml file.