Cloud Computing

Recently, there has been a push by companies like Microsoft, SalesForce, Amazon and Google, to use their cloud computing services as a platform to build applications. What makes this any different than running server within your office and storing your business data there? Nothing. You’re just outsourcing number crunching power and storage.

If you use Gmail/Yahoo! Mail/Hotmail- guess what? You’re already using a cloud application. You have no knowledge of where data exists nor do (or should) you really care. You can reach that data from any computer where you have access to the Internet, and the entry cost to that data in terms of hardware is low.

What does a business gain with moving to a cloud computing model? Cost savings in terms of hardware and data storage, and processing power and less reliance on internal servers for 100% availability. To take advantage of this savings, you become reliant on access to your chosen cloud service’s servers, this being your ISP (as if any of us wasn’t reliant on the web already), you have to rebuild your applications and you no longer have your institutional data in-house.

Rebuilding applications to take full advantage of this technology is no small undertaking. Data that resides in your existing data store will have to be ported into your chosen cloud service and any application logic that speaks to your current data will have to be rewritten.

Why is this?

atmostpheresmall In terms of Microsoft’s platform, Azure, a developer has to now conform to a new standard of data storage rather than the ORM or ADO.NET model. Azure is made to deliver mass amounts of data and provide redundancy and recovery features- in order to meet this goal, you have to do things the Azure way. You, as a developer, have a layer of abstraction that sits on top of a network of database servers and no knowledge of how the data is stored in the most basic sense.

I am not going to push for using a SAAS model of development. I, personally, am no salesman. I am not sure I could convince a business owner that 10 years of work should be moved off-site. This is not to say that you must have all of your data off-site, you can peruse a hybrid model as well. I can, however, give one guideline that can ease the transition should it be something that your company wants to do.

Remove all of your business logic from your database.

This, in and of itself, can be a troubling task. I have worked on many applications that have had stored procedures that performed business logic- this has to change in order to use a cloud based platform. There is no more access to the database, so you have to write code that modifies data in the form of a service that runs in the cloud. Encapsulation is key for the cloud model to work.

As I am a Microsoft based developer, I have been focused on the platform that Microsoft has provided (which is said to have Java support soon). Some other examples of cloud service hosts:

Perl? I’ve seen those

So this morning I was asked to get content from a website out in plain text.

htmlcontent Visually, this means that the HTML code over on the left, needs to be converted to straight text that can be viewed in notepad without all of the tags.plaintext

As I have done some screen scraping in the past for other jobs, I am familiar with the concept of taking data from a terminal screen and working with it. I have not, however, done any sort of screen scraping for web content. 

My first step in the process is to ask what other people have used. I don’t want to re-invent the wheel if possible. As I am the only developer here, I find it useful to post questions on Twitter, as most of the people I follow have some attachment to the technical industry. I use it as an open messaging system- kind of like shouting down a hall and seeing who answers.

My first response was HTML::Strip. As someone who has been a Windows programmer for most of his career, focused on Microsoft based products (and not really web based platforms), this told me absolutely nothing. Google (or Bing… I’m trying…) tells me that this is essentially a Perl module. Huh?

startmenustrawberry So begins my morning’s quest for knowledge. I do a bit of digging, and I found that what I need to do first is get a some type of Perl interpreter. These sort of things come with Unix/Linux… but Microsoft pays for my life, so I use Windows. The top of my result list was fine for me, so I went with Strawberry Perl as my interpreter of choice. 

As a value add, I get a CPAN Client which is essentially a universal installer utility for installing modules which can be consumed by Pearl script. Meaning that if you need to include a reads HTML pages and return their content to you, you just tell CPAN the name of the library and it magically installs! 

I need two things to get started:

  • HTML::Strip – the Perl library that strips content out of web pages
  • LWP::Simple – the Perl Library to manipulate HTML

cpaninstallAfter a bit more research I found that all I need to do is launch my CPAN Client and in the command prompt run install HTML::Strip, and install LWP::Simple. It’s really just that simple! No messing around with installer files. It just works. Now I can write a script that consumes those libraries.

This post is getting long and rather than just drag on with coding, here’s how we can scrape the text from a web page using Perl:

#!/usr/bin/perl

use HTML::Strip;
use LWP::Simple;

my $hs = HTML::Strip->new();
my $url = "http://www.google.com";
my $content = get($url);

my $clean_text = $hs->parse($content);
print $clean_text;
$hs->eof;

Done. That will write the contents of our $urlvariable’s web site to the screen. I save my script into a text file C:\strawberry\perl\Scripts\Scraper.pl (I use .pl as the file extension only for convention’s sake). 

scraped To execute my script, I open the command prompt and type perl C:\strawberry\perl\Scripts\Scraper.pl and the result is printed to the command window.

 

 

Obviously, I still have some work to do to make my little script a viable solution:

  • Scrape a specific target area on the web page rather than the whole page
  • Loop thru a list of pages to parse rather than just a single page

…but that’s the brunt of what I needed it to do. For all you Perl experts out there- I probably butchered your favorite language... sorry.

 

Content Management

Who needs revision tracking? I do, and I love it. I want to be able to see the changes made to a document or spreadsheet and the comments added along with a date. As a programmer I have used some form of source control for ten years and without knowing it, I have come to rely on it to keep track of changes. Consequently, I was able to roll a piece of code back to a version before I broke it. 

There are many terms for keeping track of versioning within a document. Over the years, our terms have changed and our ability to track changes has grown. DMS’s (Document Management Systems) became CMS’s (Content Management Systems) which then became ECMS’s (Electronic Content Management Systems). Why just let a document have all the fun? What about spreadsheets, images and executable?

There are hundreds of solutions to allow you to track versioning in your documents and all of them are better than searching through years of e-mails looking for the one sent by the colleague who had sent the version of the document that you want.

cms0 Right now I’m writing this article in Google Docs. If you have not used this solution to simplify your organization’s revision tracking, I suggest you take a look at it. I have found this to be the best solution for my personal documents because of the zero software footprint on my computers.

I can see the changes that were made between two different versions of this article. Should I need to compare the differences, Google Docs allows me to show that information, as well as tagging the changes with a comment. Most importantly, this tool scales well from one users to many.

To try and apply CMS concepts to the real world, think of this in terms of a sales proposal: a team of people working on a single document. We would have a technical group to gather requirements for the project, a sales group adding (and revising) the cost of products and services, and documentation group adding and tailoring verbiage to the specific client.

Over all of this activity, the account manager would be constantly reviewing the document. In our example, and probably more often than not- in practice, our account manager works externally, allowing very little physical contact with the team of people working on the proposal during the sales cycle.

In a world without Content Management, the sales manager gets separate e-mails from the technical staff, documentation team, and internal sales teams, each e-mail requires changes that will impact the other teams. However, each group is busy on many other internal projects and finding time to get the team together is difficult.

Now frustrated, the account manager edits each document from his hotel and replies to each team. Unwittingly, the sales manager has now just added more places to search for a document, by adding revisions and sending an e-mail, they now must search their ‘Sent Items’ each time they look for a copy of the document. Not to mention, each group not having access to the other’s changes until they are compiled into the draft version on the internal network. 

cms1 Enter the concept of content management. Using some sort of CMS system, the team works with a single document that can be modified with revision tracking. Our account manager can now see the changes by each user on the team. Because everyone is now using the same document, each team member’s changes can be seen by all others. 

Collaboration is now inherent to the system. The account manager can now make pricing changes owing to some lunchtime feedback from their prospect and the technical staff can adjust some of their hardware requirements. Rather than using a strikethrough font to tell a team member to remove a sentence, the sales manage can make the changes, and allow the CMS to show the differences in the versions.

From the very high level, a content management system is a package of services that allow users to store and track changes to a piece of information. That piece of information could be a spreadsheet, a web page, or a document.

Examples of ECMS:

To give credit where it is due: this post was written in response, and perhaps to elaborate on, a post by Brian Caldwell.