Monthly Archives: September 2014

Introduction to Machine Learning, from data acquisition to a production service

In this post I want to share some notions of machine learning and a how-to get started with Microsoft Azure ML studio.
Machine learning is a quite complex topic, especially if you want to understand the theory and algorithms that are behind it. If you want to go deeper in understanding ML I would encourage you to register for the free online course by Andrew Ng on coursera.org. I have done it, it is really great but be ready to invest enough time and brain waves
J.

Thankfully, ML studio offers a simple way to do machine learning without the need of understanding any of the algorithms as long as you follow a workflow. My simplistic workflow is split in 4 steps: data acquisition, data preparation, training and tuning, web-service deployment. Let me go through all these steps.

Machine learning

Data acquisition

Machine learning needs two things to work, data (lots of it) and models. When acquiring the data, be sure to have enough features (aspect of data that can help for a prediction, like the surface of the house to predict its price) populated to train correctly your learning model. In general, the more data you have the better so make to come with enough rows!

Data is generally stored as a CSV file (comma separated value file created with excel) or any other supported type of dataset. Currently the supported max size is 1.95 GB (which is enough for anything not “big-data”).

Handling the data: data preparation


Once you have the data available from Azure ML, it’s time to prepare it. There are multiple transformation operations supported: filtering, manipulation, sampling, and scaling and reducing. Full documentation for data transformation. To start, you would want to scrubber missing values, edit the columns, and then split your dataset for training and validation. This will be fed to your training model along with the learner selected.

 

Training and tuning

The first thing you need to figure out first is what type of analysis you need to run (see full documentation). It can either be a classification analysis (spam vs non-spam), a clustering analysis (automatic classification), or a regression analysis (for prediction and forecasting). Once you have this figured out, what you need to do is compare the results of multiple models to know which one is the most efficient for your dataset. The logic here is to:

  1. Select a model with default value
  2. Train the model
  3. Score the model
  4. Evaluate the model (to figure out which model is the most efficient)
  5. Sweep the model (to figure out the best configuration for your model)
  6. Evaluate the model
  7. Save the trained model (to be used in production)

Some useful notions for evaluation each mode (from here):

  • MAE (Mean Absolute Error): The average absolute difference between the predicted value and the actual value
  • RMSE (Root Mean Squared Error): The square root of the average of squared error of prediction made on the dataset
  • RAE (Relative Absolute Error): The average of absolute errors relative to the absolute difference between actual values and the average of all actual values
  • RSE (Relative Squared Error): The average of squared errors relative to the squared difference between the actual values and the average of all actual values.
  • CoD (Coefficient of Determination): Also known as the R squared value, this is a statistical metric indicating how well a model fits the data.
  • For each error, a smaller value indicate a closer match. For CoD, the closer it is from 1.0 the better the prediction.

    When comparing the evaluation of models, you will want to validate that the main values are going up (like accuracy (proportion of true results to total cases), precision (true over all positive results) and recall (fraction of all correct results returned by the model), AUC (Area Under Curve – provide a single number that let compare different model), F1 score (measure of accuracy balancing precision and recall)). See Metrics used for model evaluation for more details.

    Deployment

    Assuming above steps are completed and you have found satisfactory values for your model evaluation, you can clean-up your workflow to have only one model, with the optimum configuration. Publish you score model as an input and define your output. Run and publish your web service. From there you will have access to the API help page containing sample code for C#, R, and Python and a test URL where you can manually set some values.

    In your web service configuration tab, you can enable it as ‘ready for production’ and start using it in real production project!

    Resources

    Machine learning for beginners.
    If you want to go deeper in understanding ML, Andrew Ng have a fantastic class on coursera.org.
    All the videos tutorial that you need to get started with Azure ML.
    Predictive solution walk-through.

     
    0 Kudos
    Don't
    move!

    Embracing change

    We all experienced change in our life. Whether small, like moving to a new home, or larger, like changing job or moving to a new country, it always comes with a fair amount of stress. I believe that the stress associated with change is coming from all the uncertainty that change triggers (I’m not a psychologist, at all.). I have had a fair share of change in my life, moving across multiple countries, changing jobs, becoming a husband and a dad, and more recently going through re-organization and layoffs at my workplace. Looking back, and simplifying a bit, there is really 3 phases of change (psychologists would argue there are multiple stages: precontemplation, contemplation, determination, action, maintenance). In the first phase you come to be aware of upcoming change, the second phase is the proper transition period, and the third phase is when the change is completed. Below are some tactics I have been using to have a smooth transition.

    Upcoming change

    It’s crucial to acknowledge when change is coming your way. At this point you might not know where you are heading to but you should not ignore the change. Being aware of the change have a couple of positive aspects, you can stay in motion with the change and actively help influencing the direction of the change. Not knowing where the journey will end is stressful, no question about this but it is important to have a north star to ‘survive’ the cycle. Don’t fall into a panic mode (people around you might be freaking out, don’t let this impact you – help them see the benefits of the change or its temporality). Understanding the root of the change will be extremely useful during the next phases.

    Transition

    Change is ongoing, embrace it! At this point, there is probably a lot of chaos, a lot of uncertainty, and a lot of frustration. As with most things, we can turn this time to have a couple of positive outcome. Get involved in the transition, help make it successful, stir it in the right direction. Getting knowledge on the change will help you adjust to it. Stay aware that you are in a transition phase, level down your anxiety and be patient.

    Post change

    At this point, you should have adjusted to the change. You can start focusing on building up on the new foundation. Have a look back and be proud of the journey you just had, another one is coming soon…

     
    0 Kudos
    Don't
    move!

    Career superpowers – book summary

    I loved this book, career superpowers: succeeding on purpose by James Whittaker. It is refreshing to strategically think about your career and to feel in control. The author breaks down the career superpowers in 8 attributes: ambition, passion, specialization, storytelling, imitation, derivation, creativity, and leadership.

    Ambition

    Ambition sets the bar for you career, you increase your chance of climbing high by aiming high. It means aligning your skills and work ethic to achieve what you want from life. To fulfill your ambition, you will have to identify stupid rules and refuse to allow them to have power over you (loyalty to your company is one stupid rule).
    Out delivering your hard working colleagues is more likely to burn you out rather than help you reach your long term goal. Working hard is a necessary condition of success but it is not sufficient in itself. You need to work smart, do not invest time in impressing your manager(s), and spend time to better your situation and yourself. Have a larger view, take the pulse of the industry rather than the pulse of your company. You have to learn to stand out.
    Your career has to be handled as a project, keep track of it daily think where is it headed. What’s in your way, how is your progress? Are you ever going to get where you want to be on your current course?

    Passion

    If you are not passionate about your job then you won’t have the additional energy, enthusiasm and work ethic that comes with passion. You should share your passion broadly (blog, papers, manifestos) and advertise the thing that turns you on. Also, being intensely interested in the subject you are presenting makes the material more interesting to your audience.

    Specialize

    A specialty is the ultimate career aide, choose one that matters (high value), that is visible, and has implication across the industry and not just your company. Pick a specialty that is well within your skill set and you can master completely (being good only is not enough). Don’t stretch to be the dumbest guy at the level above you, instead be the smartest at the level below.
    Once you got a specialty you love, sell it! Have a good elevator pitch (write it, practice it, and promote it) for all your specialty (don’t be a one trick pony, be ready to change), it should explains what you do and why it is valuable in one simple sentence.
    You will enjoy the dividend of you specialty during the rest of your career. You need to learn, master, and own your specialty.

    Storytelling

    Directly influence how people think and get them to see your ideas and values in full color.

    Imitation

    The people around you have a great power over who you become and how high you reach. Choose them well. If you need to learn something from someone, make sure they are really good at it.

    Derivation

    It is not about making the invention, it is about being the one who makes the invention meaningful. You should be aware of your industry and how you are using product developed by your industry. You should be able to name at least 10 industry-insider websites that publish news about your industry. Looking at the world and wanting to fix it is part of the derivative superpower.

    Creativity

    The more you know, the more creative you can become. Find out when and where you can be the most creative, build your day to enable creativity.

    Leadership

    All previous attributes, make you successful. Being successful creates gravity around you, this is call being a leader. You need to know yourself to be a leader, knowing where you want to go and where you want to be in the future. Work on product that is important to the company, no career is served by working on a product that the company doesn’t really care about. Leader can build a team where everybody can recite the product elevator pitch, each understand why the product is important, what and who the competition is, and how the product fits in the company’s lineup.

     

    Above summary is really short and might not make sense to someone who haven’t read the full book. I definitely recommend this one for anybody who wants to drive their career.

     

     
    2 Kudos
    Don't
    move!