Introduction to Machine Learning, from data acquisition to a production service

In this post I want to share some notions of machine learning and a how-to get started with Microsoft Azure ML studio.
Machine learning is a quite complex topic, especially if you want to understand the theory and algorithms that are behind it. If you want to go deeper in understanding ML I would encourage you to register for the free online course by Andrew Ng on coursera.org. I have done it, it is really great but be ready to invest enough time and brain waves
J.

Thankfully, ML studio offers a simple way to do machine learning without the need of understanding any of the algorithms as long as you follow a workflow. My simplistic workflow is split in 4 steps: data acquisition, data preparation, training and tuning, web-service deployment. Let me go through all these steps.

Machine learning

Data acquisition

Machine learning needs two things to work, data (lots of it) and models. When acquiring the data, be sure to have enough features (aspect of data that can help for a prediction, like the surface of the house to predict its price) populated to train correctly your learning model. In general, the more data you have the better so make to come with enough rows!

Data is generally stored as a CSV file (comma separated value file created with excel) or any other supported type of dataset. Currently the supported max size is 1.95 GB (which is enough for anything not “big-data”).

Handling the data: data preparation


Once you have the data available from Azure ML, it’s time to prepare it. There are multiple transformation operations supported: filtering, manipulation, sampling, and scaling and reducing. Full documentation for data transformation. To start, you would want to scrubber missing values, edit the columns, and then split your dataset for training and validation. This will be fed to your training model along with the learner selected.

 

Training and tuning

The first thing you need to figure out first is what type of analysis you need to run (see full documentation). It can either be a classification analysis (spam vs non-spam), a clustering analysis (automatic classification), or a regression analysis (for prediction and forecasting). Once you have this figured out, what you need to do is compare the results of multiple models to know which one is the most efficient for your dataset. The logic here is to:

  1. Select a model with default value
  2. Train the model
  3. Score the model
  4. Evaluate the model (to figure out which model is the most efficient)
  5. Sweep the model (to figure out the best configuration for your model)
  6. Evaluate the model
  7. Save the trained model (to be used in production)

When comparing the evaluation of models, you will want to validate that the main values are going up (like accuracy, precision and recall, AUC, F1 score).

Deployment

Assuming above steps are completed and you have found satisfactory values for your model evaluation, you can clean-up your workflow to have only one model, with the optimum configuration. Publish you score model as an input and define your output. Run and publish your web service. From there you will have access to the API help page containing sample code for C#, R, and Python and a test URL where you can manually set some values.

In your web service configuration tab, you can enable it as ‘ready for production’ and start using it in real production project!

Resources

Machine learning for beginners.
If you want to go deeper in understanding ML, Andrew Ng have a fantastic class on coursera.org.
All the videos tutorial that you need to get started with Azure ML.
Predictive solution walk-through.

Embracing change

We all experienced change in our life. Whether small, like moving to a new home, or larger, like changing job or moving to a new country, it always comes with a fair amount of stress. I believe that the stress associated with change is coming from all the uncertainty that change triggers (I’m not a psychologist, at all.). I have had a fair share of change in my life, moving across multiple countries, changing jobs, becoming a husband and a dad, and more recently going through re-organization and layoffs at my workplace. Looking back, and simplifying a bit, there is really 3 phases of change (psychologists would argue there are multiple stages: precontemplation, contemplation, determination, action, maintenance). In the first phase you come to be aware of upcoming change, the second phase is the proper transition period, and the third phase is when the change is completed. Below are some tactics I have been using to have a smooth transition.

Upcoming change

It’s crucial to acknowledge when change is coming your way. At this point you might not know where you are heading to but you should not ignore the change. Being aware of the change have a couple of positive aspects, you can stay in motion with the change and actively help influencing the direction of the change. Not knowing where the journey will end is stressful, no question about this but it is important to have a north star to ‘survive’ the cycle. Don’t fall into a panic mode (people around you might be freaking out, don’t let this impact you – help them see the benefits of the change or its temporality). Understanding the root of the change will be extremely useful during the next phases.

Transition

Change is ongoing, embrace it! At this point, there is probably a lot of chaos, a lot of uncertainty, and a lot of frustration. As with most things, we can turn this time to have a couple of positive outcome. Get involved in the transition, help make it successful, stir it in the right direction. Getting knowledge on the change will help you adjust to it. Stay aware that you are in a transition phase, level down your anxiety and be patient.

Post change

At this point, you should have adjusted to the change. You can start focusing on building up on the new foundation. Have a look back and be proud of the journey you just had, another one is coming soon…

Career superpowers – book summary

I loved this book, career superpowers: succeeding on purpose by James Whittaker. It is refreshing to strategically think about your career and to feel in control. The author breaks down the career superpowers in 8 attributes: ambition, passion, specialization, storytelling, imitation, derivation, creativity, and leadership.

Ambition

Ambition sets the bar for you career, you increase your chance of climbing high by aiming high. It means aligning your skills and work ethic to achieve what you want from life. To fulfill your ambition, you will have to identify stupid rules and refuse to allow them to have power over you (loyalty to your company is one stupid rule).
Out delivering your hard working colleagues is more likely to burn you out rather than help you reach your long term goal. Working hard is a necessary condition of success but it is not sufficient in itself. You need to work smart, do not invest time in impressing your manager(s), and spend time to better your situation and yourself. Have a larger view, take the pulse of the industry rather than the pulse of your company. You have to learn to stand out.
Your career has to be handled as a project, keep track of it daily think where is it headed. What’s in your way, how is your progress? Are you ever going to get where you want to be on your current course?

Passion

If you are not passionate about your job then you won’t have the additional energy, enthusiasm and work ethic that comes with passion. You should share your passion broadly (blog, papers, manifestos) and advertise the thing that turns you on. Also, being intensely interested in the subject you are presenting makes the material more interesting to your audience.

Specialize

A specialty is the ultimate career aide, choose one that matters (high value), that is visible, and has implication across the industry and not just your company. Pick a specialty that is well within your skill set and you can master completely (being good only is not enough). Don’t stretch to be the dumbest guy at the level above you, instead be the smartest at the level below.
Once you got a specialty you love, sell it! Have a good elevator pitch (write it, practice it, and promote it) for all your specialty (don’t be a one trick pony, be ready to change), it should explains what you do and why it is valuable in one simple sentence.
You will enjoy the dividend of you specialty during the rest of your career. You need to learn, master, and own your specialty.

Storytelling

Directly influence how people think and get them to see your ideas and values in full color.

Imitation

The people around you have a great power over who you become and how high you reach. Choose them well. If you need to learn something from someone, make sure they are really good at it.

Derivation

It is not about making the invention, it is about being the one who makes the invention meaningful. You should be aware of your industry and how you are using product developed by your industry. You should be able to name at least 10 industry-insider websites that publish news about your industry. Looking at the world and wanting to fix it is part of the derivative superpower.

Creativity

The more you know, the more creative you can become. Find out when and where you can be the most creative, build your day to enable creativity.

Leadership

All previous attributes, make you successful. Being successful creates gravity around you, this is call being a leader. You need to know yourself to be a leader, knowing where you want to go and where you want to be in the future. Work on product that is important to the company, no career is served by working on a product that the company doesn’t really care about. Leader can build a team where everybody can recite the product elevator pitch, each understand why the product is important, what and who the competition is, and how the product fits in the company’s lineup.

 

Above summary is really short and might not make sense to someone who haven’t read the full book. I definitely recommend this one for anybody who wants to drive their career.

 

Facebook SDK 4.0.0 for PHP: A working sample to manage sessions

 

Once you have a working sample of Facebook SDK 4.0.0 for PHP, you will notice upon refreshing the page an error:
Fatal error: Uncaught exception ‘Facebook\FacebookAuthorizationException’ with message ‘This authorization code has expired.’

Well, this is quite annoying as it breaks the user navigation on your site. To get around this issue, record the FacebookSession token to the user’s session and use it next time the page load.

<?php 
session_start();

require_once( 'Facebook/FacebookSession.php' );
require_once( 'Facebook/FacebookRedirectLoginHelper.php' );
require_once( 'Facebook/FacebookRequest.php' );
require_once( 'Facebook/FacebookResponse.php' );
require_once( 'Facebook/FacebookSDKException.php' );
require_once( 'Facebook/FacebookRequestException.php' );
require_once( 'Facebook/FacebookAuthorizationException.php' );
require_once( 'Facebook/GraphObject.php' );
require_once( 'Facebook/GraphSessionInfo.php' );

use Facebook\FacebookSession;
use Facebook\FacebookRedirectLoginHelper;
use Facebook\FacebookRequest;
use Facebook\FacebookResponse;
use Facebook\FacebookSDKException;
use Facebook\FacebookRequestException;
use Facebook\FacebookAuthorizationException;
use Facebook\GraphObject;
use Facebook\GraphSessionInfo;

$appid = ''; // your AppID
$secret = ''; // your secret

// Initialize app with app id (APPID) and secret (SECRET)
FacebookSession::setDefaultApplication($appid ,$secret);

// login helper with redirect_uri
$helper = new FacebookRedirectLoginHelper( 'http://www.metah.ch/' );

try 
{
  // In case it comes from a redirect login helper
  $session = $helper->getSessionFromRedirect();
} 
catch( FacebookRequestException $ex ) 
{
  // When Facebook returns an error
  echo $ex;
} 
catch( Exception $ex ) 
{
  // When validation fails or other local issues
  echo $ex;
}

// see if we have a session in $_Session[]
if( isset($_SESSION['token']))
{
	// We have a token, is it valid? 
	$session = new FacebookSession($_SESSION['token']);	
	try
	{
		$session->Validate($appid ,$secret);
	}
	catch( FacebookAuthorizationException $ex)
	{
		// Session is not valid any more, get a new one.
		$session ='';
	}
}

// see if we have a session
if ( isset( $session ) ) 
{   
	// set the PHP Session 'token' to the current session token
	$_SESSION['token'] = $session->getToken();
	// SessionInfo 
	$info = $session->getSessionInfo();	
	// getAppId
	echo "Appid: " . $info->getAppId() . "<br />"; 
	// session expire data
	$expireDate = $info->getExpiresAt()->format('Y-m-d H:i:s');
	echo 'Session expire time: ' . $expireDate . "<br />"; 
	// session token
	echo 'Session Token: ' . $session->getToken() . "<br />"; 
} 
else 
{
  // show login url
  echo '<a href="' . $helper->getLoginUrl() . '">Login</a>';
}
?>

The session expire time (getExpiresAt()) can be used to monitor how long the token can be used and potentially request a new token.

Facebook SDK 4.0.0 for PHP: A working sample to get started.

I downloaded the new Facebook SDK 4.0.0, hoping to have a quick sample running. Instead, I had to scratch my head for nearly 1 hour before having a working sample. Not really a great experience (‘thanks’ Facebook for the great getting started document)!
Now at least I got it working, so I’m sharing some information below, as always, hoping to save you the time I just lost J.

 

Assumptions:

  1. You have a working php hosting setup (no, I’m not covering this here).
  2. You have downloaded the SDK and uploaded the “Facebook” folder at the root of your project.
  3. You have an developer account at Facebook: https://developers.facebook.com/apps

Working code:

session_start();

require_once( 'Facebook/FacebookSession.php' );
require_once( 'Facebook/FacebookRedirectLoginHelper.php' );
require_once( 'Facebook/FacebookRequest.php' );
require_once( 'Facebook/FacebookResponse.php' );
require_once( 'Facebook/FacebookSDKException.php' );
require_once( 'Facebook/FacebookRequestException.php' );
require_once( 'Facebook/FacebookAuthorizationException.php' );
require_once( 'Facebook/GraphObject.php' );

use Facebook\FacebookSession;
use Facebook\FacebookRedirectLoginHelper;
use Facebook\FacebookRequest;
use Facebook\FacebookResponse;
use Facebook\FacebookSDKException;
use Facebook\FacebookRequestException;
use Facebook\FacebookAuthorizationException;
use Facebook\GraphObject;

// init app with app id (APPID) and secret (SECRET)
FacebookSession::setDefaultApplication('APPID','SECRET');

// login helper with redirect_uri
$helper = new FacebookRedirectLoginHelper( 'http://www.metah.ch/' );

try {
  $session = $helper->getSessionFromRedirect();
} catch( FacebookRequestException $ex ) {
  // When Facebook returns an error
} catch( Exception $ex ) {
  // When validation fails or other local issues
}

// see if we have a session
if ( isset( $session ) ) {
  // graph api request for user data
  $request = new FacebookRequest( $session, 'GET', '/me' );
  $response = $request->execute();
  // get response
  $graphObject = $response->getGraphObject();
  
  // print data
  echo  print_r( $graphObject, 1 );
} else {
  // show login url
  echo '<a href="' . $helper->getLoginUrl() . '">Login</a>';
}

Debugging pointers:

As already explained, I had some issues, below are some I faced and how I eventually fixed them

  1. When using session_start(), I had a php warning:
    Warning: session_start(): Cannot send session cache limiter – headers already sent (output started at …/index.php:1) in …/index.php on line 2
    To fix it use an editor that supported UTF-8 without BOM (Notepad++ encryption tab is useful).
  2. When calling the FacebookSession::setDefaultApplication I had an error:
    Fatal error: Class ‘FacebookRedirectLoginHelper’ not found in …/index.php on line 28
    This one is totally embarrassing (lame excuse: I haven’t used PHP for several years! And the solution is pretty straight forward, load the class first :)

    require_once( ‘Facebook/FacebookSession.php’ );

     

  3. When clicking on the Login link, I had an OAuth error:

    To fix it, simply go to your app setting (from your Facebook developer account) and correctly set your Site URL and App Domains.

 

Conclusion

Hopefully, you will be able to get your example working in a couple of minutes! Let me know otherwise.

Splitting large CSV in smaller CSV

I had to use a tool today that gave me a frustrating error: “your csv file have more than a 1000 rows”. I had multiple CSVs that had more than 1000’s of rows actually, so splitting it manually would have been a total waste of time and sanity. That’s where PowerShell came in handy. As it took me more than 5 minutes to get the script right, I’m sharing it here.
The script is looking for all CSV’s within a defined folder ($location variable in the configuration), and split them by the number of rows defined in $rowsMax.
It is also removing the quotes in the CSV as by default export-csv generates quotes.

# Configuration
$location = "C:\csvdrop\" # CSVs location
$rowsMax = 900; # how many rows per CSV?

# Get all CSV under current folder
$allCSVs = Get-ChildItem $location\* -include *.csv

# Read and split all of them
$allCSVs | ForEach-Object {
Write-Host $_.Name;
$content = Import-Csv $_.Name;
$insertLocation = ($_.Name.Length - 4);
for($i=1; $i -le $content.length ;$i+=$rowsMax){
$newName = $_.Name.Insert($insertLocation, "splitted_"+$i)
$content|select -first $i|select -last $rowsMax | convertto-csv -NoTypeInformation | % { $_ -replace '"', ""} | out-file $location\$newName -fo -en ascii
}
}

Hopefully, this works for you too!

The week in search advertising 3/15/2014

Episode #3 of the weekly review of search adverting! Covering the week ending on 3/15/2014.

  • Native Ads are not engaging readers
    • Some data shows that native ads are not getting as much attention from the visitors
    • This can be somewhat improved if the ad creative is of good quality
    • Source: marketing land
  • Twitter experiment on click to call ads
  • Native ad spotted on The Wall Street Journal
    • Transparent, quality, and rare
    • Source: Digiday
  • Google experiment desktop to mobile retargeting
    • As cookies don’t really work on mobile devices, they use a “hashed tag” dropped when a user visit the advertiser’s site that ties to the cookies and device of the user.
    • Source: Ad Age
  • Addressing the Bing dilemma
    • 1/3 of all US search traffic goes to the Yahoo Bing Network.
    • There is a revenue opportunity to advertise on Bing (disclaimer: I work for Bing Ads)
    • Source: Search Engine Land
  • Facebook is showing autoplay advertising video
    • Facebook is willing to display only quality video, and will have a company gauges the commercials
    • Source: Business Insiders
  • Ads Worth Spreading
    • Some beautiful ads, for the creative folks J
    • Source: Ted

The week in online advertising (ending 3/9/2014)

I’m slightly increasing the scope of this series to capture a larger part of the online advertising (instead of focusing on search advertising only).

  • Focusing on mobile is solving yesterday’s problems
    • The focus should now be on ads that work across multiple screens
    • Focusing on reach is where we are trending today
    • Source: Forbes
  • 8 ways to write terrible online ads
    • Blindly using keyword insertion
    • Relying too much on broad keywords
    • Forgetting to spell check your ads
    • Using abbreviations to save space
    • Using technical jargon
    • Including your company name in the headline
    • Using superlatives
    • Don’t promise what you can’t deliver
    • Source: Entrepreneur
  • The good, the bad, and the ugly from last night’s Oscars real-time marketing
  • Why publishers are moving away from mobile banner ads and toward cross-device formats
    • 300×250 ad unit works well on all platforms
    • Placement should be automated, smaller screens tend to have more real estate available on mobile
    • Source: Digiday
  • Apple rolling out full screen video ads within apps
    • Interstitials video might pop-up while you are using your apps on iPhone or iPad (probably at a transition time)
    • This will probably be sold through Apple new ad exchange
    • Source: Advertising Age
  • How to take advantage of YouTube Pre-Roll ads
  • Does more targeting work better?
    • Interesting discussion on the future of targeting and its usage
    • Data shows that the user context (search vs social) impact his response to targeting / advertising
    • Source: Search Engine Land
  • How internet ads works
  • Instagram strikes first big ad deal
    • It was announced a while back and now it’s official, Instagram will be displaying ads
    • Source: Marketing Land

The week in search advertising

I decided to start a new series on this blog about what happened in the small world of search advertising during the week. It will mostly be a collection of links to interesting articles. The primary goal, for me, is to retrieve quickly articles. I hope this series will be useful for more people than just me!

  • How Facebook knows what you looked at on Amazon
    • Simple explanation on how FBX works
    • Source: 25hoursaday
  • Facebook is rolling out new targeting mechanics, the new options are:
    • Location (country & city; country & state; state & city; state & zip code)
    • Demographic (relationship status; life events as relationship status)
    • Interests (people interested in a topic – supposedly better than categories and keywords targeting)
    • Behaviors (based on offline activates: purchases, website visits)
    • Source: marketing land
  • Ad tech merger and acquisition will rage on 2014
  • WTF is ad viewability
    • Interesting discussion on the problem of displaying ads on some part of a site that is not actually viewed by the visitor (think the bottom of a site).
    • Source: Digiday
  • How to build effective mobile in-apps ads without irking your users
    • Advises for native advertising (embedding ads within the content)
    • .01% of apps will generate revenue
    • Source: TNW
  • The Ad Industry Reinvents the Hyperlink for the mobile Era
    • How advertiser are thinking of expending the hyperlink to be within Apps
    • Source: MIT TR

Some competition in the contextual ads landscape, finally!

I am quite happy to have received an invite to the Yahoo! | Bing network contextual ads (powered by media.net). I just migrated my websites to use these ads.
I don’t have yet information regarding the profitability of this network, I should be able to come back with some numbers in a couple of weeks (or month given the frequency of my posts on this blog J).

Setting up ads is super easy, if you are already familiar with AdSense, it should not be a problem at all. Setup your ad unit, copy the code to your website, and start getting paid.

If you are looking for an invite, this is the way: http://contextualads.yahoo.net/