Script to Generate RSS Feeds

1 Since my last blog post I've decided that I was not happy with my RSS feed. So I've created a bash script to generate full RSS feeds for my full blog posts and not just the link and title of my posts. How I've made my script work is by taking the directory containing webpage for my blog and parsing the .html files within for the required information. My RSS feed now contains the full contents of my blog posts.

2 I've created an easily-readable 'example script' based on the one that runs on my website. It's the same script as the one that I run on my website except it's more cleaned up and doesn't have my own domain and directory path information in it. This example script is the one that is featured in the picture at the bottom of this blog post.

3 Here is a link to a copy of the script. If you want to use is make sure to change www.example.com with your domain name and /blog for posts_directory with your path to the directory containing the webpages you want to implemented in your RSS feed. Run the script in the terminal and it should generate a file called blog.xml. Move that file to the root of your website directory (or change the script to generate the file in your website directory) and your RSS feed is done.

4 The script generates most of the RSS feed by grabbing the text between certain html tags with 'awk' and 'sed' for each html file and plugging them into a xml file with 'echo'. The title for each item in the .xml file is generated from taking what's between the h1 tags of the associated html file in the directory. Pubdate for each item is generated by taking what's between the time tags of the associated html file in the directory (dates must be in the YYYY-MM-DD format in the html documents). Description is generated from taking from what's between the article tags of the associated html file in the directory.

5 Some information for script generation come from the file names of the html documents in the directory or can be generated without referring to the contents inside the html documents. The information for the link tag for each item in the .xml file is generated from taking the associated html file name in the directory and adding a domain name and directory path. The information in the header of the .xml file obviously doesn't need to be generated from another file.

6 Edit: Credit to rvense from hacker news that suggested that my bash scripts should use "bash strict mode". I've updated my script to include that on his advise. I have also added language information and information for atom feed support in the header of the xml file.