Sunday, November 8, 2009

Content Rating Progress

I got a lot of work done on SkimmerAgent today.  Overall, there were 33 files created and 11 altered for interface specifications, code classes, test classes and user interface functionality. 

Today’s focus was the content rating feature.  To reach the end goal a number of filters were created.  Under the hood there are filters for removing @usernames and URLs from tweets.  The tweet itself is also tokenized.  “Stop words” are now removed as part of the content rating.  I’m using the MySQL stop words list.

I implemented a simple heuristic for content rating using a hard coded goal word list.  The heuristic is fairly simple yet has proven effective.

There was also a UI concurrency issue that had to be dealt with.  I did change up the main screen to display the skimmed public timeline all the time in a splitter panel.  A toggle was added to turn off public time line skimming.  There is much that needs to be done with the UI.  However, for now the primary focus is the content rating.  I plan to address some of the UI needs in December.

Next up I will implement a content rater that takes a vector space model (VSM) cosine θ approach.

With some UI enhancements I’m thinking about a December alpha release.

screen shot showing SkimmerAgent split grid UI

No comments:

Post a Comment