When it comes to recommendation engine, you don’t have much choice in terms of library available ready to be used for production.
Hopefully, Apache Mahout has been out there for quite a while. Mahout Taste is a subproject included in the library, focused on non-batch (Hadoop) recommendation engine.
I guess it is the right time to publish my article as Mahout has recently released a new version with some improvements (July, 23rd).
These last weeks, I have been working on a real-time recommendation engine backed by MongoDB for long term storage and using in memory data model for Mahout Taste recommendation engine.
The goal of this project was to be able to:
Follow all of the actions a website visitor is doing (product view)
Being able to recommend him products upfront without much change in the website where the engine is plugged in.
Recommendation engine should be completely isolated from a data perspective to other systems (no ETL).
Callable by web-services
The technical requirements I have set myself for this project were :
Easy to work with (from a developer perspective)
At first, we need to have a way to log all visitors activity very quickly. It is the cornerstone in recommendation engine to have the maximum amount of data available.
The first module of the application will be build using Netty, the very fast socket server. A simple Java 7 asynchronous file will store the events from websites activity. Also, using Google protocol buffer was the simplest approach to dump easily semi-structured data.
What I’ve seen these weeks with Oracle letting Apache leaving the JCP Executive Committee seems to lead to the “old days” where we had to buy a closed platform to work on. It was before Internet, the open source movement, Linux, Git, etc.
A world where we have to rely on a company to have support, when we can’t make our C++ COM object working properly without MSDN subscription. A world where Delphi was a good tool for creating GUI (sic). A world were Visual Basic was one of the most used language.
Actually it does matter. It is just not always the right moment to optimize your code. There are several patterns applied properly at the start of most project helping you worrying about performance at a later stage. Obviously this won’t help if you need to do highly CPU demanding processing (neuronal network, nuclear bomb simulation, …).
First, some brief overview of the two technologies. OSGi is a framework aims to create an component/plug-in software. I’ve already mention OSGi in a previous post.
SCA stand for Service Component Architecture. Oracle, IBM and others came up with the conclusion than SOA is just a buzz word and without a plain framework, everyone can say they do SOA development but actually doing something else … They have created several specifications to help to produce a SOA software. Key features are separation of concern, component oriented, dynamic binding, aggregation/composition of services/components.
C++0x is the standard which describe the new C++ language specification. It’s still draft. The main focus of the C++ committee is the development of the core libraries. In the language itself, the committee will introduce lambda functions (like Clojure, scala, groovy, ruby, php, …), nullptr variable (interpreted as a pointer whereas today NULL is a defined integer most of the times, it depends on the compiler) and few other minor changes.
Both Visual Studio 2010 and gcc brings support C++0x with their latest versions. I’m very curious myself to test lambda functions. For those of you who want to try them, you will find here the 2 tables describing which features each of the compiler comply with.
Java, as a mainstream programming language evolves to always trying to fit the latest needs. One big leap has been introduced in Java 5, with several language changes. One of them was the possibility to do some meta-programming. Big word isn’t it? Indeed, meta-programming.
In Object Oriented Language like Java, you will usually construct your software using object (cars, tires, engines …) and methods (move, turn, explode …) to describe your business into something computer will understand. Sound simple right?, in theory, this is pretty simple. Things become difficult when we need to save data, display things to a screen, connect to the network. Software engineer has invented APIs (Application Programming Interface) to help application developer to interact with 3rdParty libraries in general and display, network or system libraries in particular.
This is when our code start to be un-maintainable because we mix business code (cars, tires, engines) with technical plumbing (readfile, display, connect).
Obviously, this title is a little provocative and there is a lot of shortcut . I am myself doing architecture and software development, and I have a recurrent rhetorical question tickling in my head; where should a
particular artifact be built.
Indeed, a common pattern in software development is too find the right boundaries, the right frontier, the right position, the right size:
Where should I decompose my software into service,
Where this module should be separated from another,
Where is the frontier between meta-data (configuration, …) and data.
Where is particular class should settle, in the GUI part, the back-end part, …
Buzz word, we are all talking buzz word, SSO is one of them. What is Single Sign-On by the way?
A brief description would say that Single Sign-on is a solution to allow an end-user to use different applications using the same credentials. To give you an example, when I use modern web sites like Facebook, Dailymotion, yahoo I can use OpenID to connect to any of these applications. OpenID keeps my user information and I may connect to any of theses websites with my OpenID ID .
Another incarnation of SSO in the enterprise world is described by OASIS using SAML. Security Assertion Markup Language is an XML based standard for exchanging authentication and authorization data between security domains, that is, between an identity provider (a producer of assertions) and a service provider (a consumer of assertions).
Why choosing XEN Hypervisor for your virtualization.
Two reason, on Linux, it’s the most advanced open-source virtualization system. Even if KVM is under heavy development, many features are still missing. Second reason is that Xen has a paravirtualization.
The first part of the How-To would be to explain how to install Debian and Xen. I’ve done this many times and I will probably do a How-To someday to explain how to create this first step. Let’s assume you are hosted on OVH (European Hosting Provider) and everything is already in place. Read the rest of this entry »