Convert web sites into RSS feeds
Numerous web sites deliver their news (at least, the news that there's news) via an RSS (RDF, atom, whatever) feed today. This is quite cool, as you can use a desktop aggregator to collect all the latest headlines from your favorite news site instead of hitting "reload" every ten minutes.
A lot of sites however, don't have this ability (yet?!) As a workaround, some desktop aggregators like liferea and snownews offer the possibilty to convert HTML pages on-the-fly to feeds using customized filters. These filters are stand-alone programs which take the HTML code as input and spit out RSS code. While it is not really hard for even moderate programmers (like me) to create such filters, it can always be made easier.
Script4rss takes a plain text file which holds a description for how the particular site should be converted and creates a perl script which is able to do that in the most fast and efficient way (well, someday). Users don't have to know how to program but they need to know regular expressions (although there probably aren't a lot of these people).
At the moment, script4rss is in its early development, which translates to "it can be used but you have to figure out how yourself" and "if you screw up, the script does so as well". Options include:
- Detect multiple catagories within an HTML page.
- Extract information over multiple lines.
- Pre-and append text in output.
- Attempt to circumvent "variable" HTML.
There is some documentation available.
You can get everything from sourceforge: http://sourceforge.net/projects/script4rss/
Pieter Edelman