Introducing LiteShare: Slideshare without Flash embed

Posted on March 26th, 2009

Access it here: LiteShare

Slideshare is a good service for uploading and distributing your powerpoint and keynote presentations, but sometimes embedding their Flash presentation player is overkill.

What if you have a bunch of Slideshare presentations you want to post? Their CPU-intensive flash player would ruin your users’ experience and slow down your entire site.

For this reason, I’ve created a small utility to embed SlideShare presentations as an image thumbnail and a link in place of their usual Flash player.

Use it when you don’t want to bog down your users with the extra baggage of too many flash players on a single page.



The CRM114 Discrimnator – Your own personal secretary on Mac OSX

Posted on March 14th, 2009

Imagine your boss comes in one day and says to you, “We have over 100,000 web pages on our site. Of that figure, 10,000 are from spammers. I need you to go through our list of websites and figure out which ones are spam and which are genuine.”

How do you accomplish this task without going crazy? Wouldn’t it be great if your computer just told you whether a webpage was spam or not?

Well, it can. Just give it some initial training and you’ll have your own digital secretary in no time. This is all possible through the CRM114 discriminator, which is a machine-learning tool to help you classify data according to predetermined samples.

We can use it in our case by first, feeding it documents that are known to be “spam”;  then feeding it documents that are known to be “genuine”. In these two steps, we are “training” the program to recognize the difference between spam and genuine webpages.

Finally,  for any unknown document, we’ll run it through CRM114’s “classify” function, which will guess the probability that the document belongs to either the “spam” or “genuine” group based on past training data.

Trying it out

Take a look at some sample code below. It uses Sam Dean’s wrapper library, which provides an easy-to-use Python interface to the CRM114 Discriminator .

import crm
c = crm.Classifier("/Users/iamthecheese/Desktop/crm_test_data", ["genuine", "spam"])
c.learn("genuine", "did you see that jean claude van dam movie?")
c.learn("spam", "Jean claude van dam uses viagra, you should too, here's how...")
c.classify("I went to see that movie about the dam today")

If you type that into the Python interactive command prompt and all goes well, you should see the last command return to you:

('genuine', 0.65529999999999999)

Which basically means that based on the set of training data given to CRM114, the phrase “I went to see that movie about the dam today” has a 65% chance of being genuine. Pretty cool huh?

Applying this to our spam problem, just find 20 pages that you know are spam and 20 pages you know are genuine; train the CRM114 with this set of data, and unleash it on the rest of the your 999,960 pages. It’ll save you a lot of time and you can use your “personal classification secretary” for bunches of other problems in the future as well.

Installing the CRM114 Discriminator on Mac OSX

So now that  you’re hooked, lets get to installing this program on your Mac OSX. Unfortunately, there is no current macport for the CRM114 Discriminator, so you’ll have to do some digging through Makefiles to get everything working. Here’s how to build and install the program from the source.

  1. First off, install a dependent regex library called Tre using macports
    sudo port install tre
  2. Get the source code for CRM114
    cd ~/Desktop/some_folder
    wget http://crm114.sourceforge.net/src/
  3. Modfiy the “Makefile” under the src directory by replacing the following line:
    prefix?=/usr     should become -->    prefix?=/opt/local

    commenting out the following line:

    LDFLAGS += -static -static-libgcc

    and uncommenting the following lines:

    CFLAGS += -I/opt/local/include -I${HOME}/include
    LDFLAGS += -L/opt/local/lib -L${HOME}/lib
    LIBS += -lintl -liconv
  4. Now save the Makefile and run the make and make install commands in the src directory
    make && make install
  5. Congratulations, now you’ve got the CRM114 Discriminator installed on your computer! If it’s done correctly, you should be able to run the following command in terminal to get the current version
    crm -v
  6. Finally to use the above sample code, go download Sam Dean’s Python CRM114 wrapper library and put it in a place where you can import it from python. ( The site’s login/password is “guest”).

This piece of software uses many cool machine-learning classification techniques which are beyond my ability to explain here. If you’re interested, you can read more about the algorithms below:



A Processing.js example ( Tears in Darkness )

Posted on March 5th, 2009

Click in the canvas to cry (Firefox & Safari only).
The processing.js visualization code [+]

Processing.js, makes coding graphics for the HTML <canvas> element a dream come true. Casey Reas and Ben Fry created the original library/programming environment to manipulate graphics using Java and has built up a vibrant community with their work. The 2nd revolution however, comes with John Resig's Javascript rendition of those same programming libraries. Because of his port, you can now do things with open technologies such as javascript & html that were once only possible with Flash. I've been exploring with it and this is what I've learned so far.

How was the visualization above created?

Excuse the emo title for the graphic above, but I was going for something simple so that I could quickly learn the system. Animating a tear quickly came to mind as an easy beginner's project. So to begin, let us start with an understanding of what a primitive animation is: A series of pictures rapidly displayed one after another. With Processing.js this notion is realized with the two functions setup() and draw(). You define setup() with the knowledge in mind that Processing.js will automatically call it at the beginning of your visualization to draw the first "picture" of the scene. Following this. it will repeatedly call the draw() function several times per second to generate all of the subsequent "pictures" of your scene.

The snippet below sets up your first "picture" to be 600 pixels wide and 300 pixels high.
    // This function sets up the canvas elment
    void setup() {
      size(600, 300);                              // size( width, height ) - sets the canvas size
      background(0);                               // background ( lightness ) or background ( red, green, blue, alpha ) - sets background color
    }
Creating a tear

In my world, a tear basically consists of a circle that grows as it falls. It gets to a certain size and then it stops and fades away. That is a tear. To represent that, I create an class to store its current position, size, and various other attributes:

   // This class object represents one tear. A global variable called "tears" stores an array of these individual tears
    class Tear{
       int x,y, alpha;
       float size;
       
       // This function is called when a new tear is created from the "new Tear( x, y)" command. You can pass it the x & y position of the new tear
       Tear(int xin, int yin){
          x = xin;                               // sets the x coordinate of the tear
          y = yin;                               // sets the y coordinate of the tear      
          alpha = 255;                           // sets the initial alpha (AKA opacity) of the tear         
          size = 1;                              // sets original tear radius to be 1px
       }
       
       // This function is called to update the tear's attributes
       boolean update(){
          y += 1;                                               // moves the tear down 1px
          alpha -= 5;                                           // decrease tear's opacity by 5
          size += .5;                                           // increase the tear's size by .5px
          if( y > height || alpha < 0){                         // if tear's current height or opacity is out of the visible  range return false
             return false;
          }
          return true;                                          // return true if everything went fine
       }
       
       // draw this tear with its current properties
       void draw(){
          fill(0,60, 100+random(0,100), alpha);        // fill ( red, green, blue, alpha )  - sets fill to these attributes, values between 0 and 255
          ellipse(x,y, size,size);                     // draws a circle
       }

    }

The two major things to notice are 1) the Tear.update() function which changes the tear's properties according to predefined rules and 2) the Tear.draw() function which reads in the current tear's property and does the actual drawing of the tear.

Dealing with many tears

Since our scene will have many tears at once, I'll set up an array of tears to keep track of them all

    Tear[] tears = new Tears[];                    // Create new array to keep track of all the tears
Running the draw() loop

Now that you've got the basic data structures down, we can stick them into the draw() loop to run them. The code below basically loops through all current tears, and updates and draws them.

// This function is automatically called by processing every frame
    void draw() {                                                   
      fill(0, 6);                                  // fill( lightness, alpha ) or fill( red, green, blue, alpha) -  sets fill colors of any shapes drawn hereafter 
      rect(0, 0, width, height);                   // rect ( upper_leftX, upper_leftY, lower_rightX, lower_rightY ) - draws a rectangle, 
      noStroke();                                  //  removes border on shapes drawn after this point

      for( int i=0; i < num_tears; i++){
         if ( tears[i] == null) {
            // Do nothing if current object == null
         }
         else if (tears[i].update() == false){
            // Update the current tear, if return false than set the current object to null
            tears[i] = null;
         }
         else{
            // When tear is not null, and update() wasn't false then draw() it
            tears[i].draw();
         }
      }
mousePressed interactivity

finally I create a little interactivity by listening for the mousePressed event. If someone clicks on the canvas, it will generate another tear object at that mouse location. You'll notice I've augmented the mouseX and mouseY variables to also include how much the user has scrolled the window. The mouseX and mouseY positions are relative to the original position of the canvas element and does not reflect changes to its positions when scrolling.

      if(mousePressed){
         // Add a new tear to the tears array at the current mouse location
         tears[num_tears] = new Tear(mouseX + window.scrollX ,mouseY+window.scrollY);
         num_tears +=1;
      }

    }
How did you get that nice fading effect?

The fading effect of the tears were created by painting a semi-transparent rectangle over the canvas each frame. With each frame these rectangle are painted and eventually cover up the tears.

 void draw() {                                                   
      fill(0, 6);                                  // fill( lightness, alpha ) or fill( red, green, blue, alpha) -  sets fill colors of any shapes drawn hereafter 
      rect(0, 0, width, height);                   // rect ( upper_leftX, upper_leftY, lower_rightX, lower_rightY ) - draws a rectangle, 
Summary

In summary, this animation was built by setting up a Tear class which held attributes like the tear's position. The Tear class had functions to update() its position, and to draw() it. A Tear was generated every time someone clicked on the canvas screen and was stored in an array of Tears. Every time processing.js calls draw(), a loop is executed to go through the list of existing tears to update() their properties, and then to draw() them. Overall, this is fun little system that I'll be using in the future to generate some cool infographics