Data science, machine learning en artificial intelligence zijn termen die je tegenwoordig vaak tegenkomt en vaak door elkaar worden gebruikt. Maar wat betekenen deze begrippen eigenlijk en wat is de relatie daartussen? Ook laten we zien hoe Domeinkennis, Computer Science en Wiskunde & Statistiek hier samen komen.

Data Science, letterlijk ook wel data wetenschappen. Het is de wetenschap die zich bezig houdt om kennis en inzichten te vergaren op basis van gestructureerde en ongestructureerde data middels verschillende methoden, processen en systemen. Deze inzichten kunnen descriptief zijn (wat is er gebeurd), verklarend (waarom is dat gebeurd), voorspellend (wat gaat er gebeuren) of actie georiënteerd zijn (wat kan ik doen). Het inzichtelijk maken van data kan helpen bij het maken van beslissingen die waardevol zijn voor het bedrijf. Daarnaast kunnen deze inzichten ook nieuwe kansen creëren zoals bijvoorbeeld het optimaliseren van de customer journey.

All data initiatives, whether machine learning, visualization or reporting, rely on clean data. Which means that data preparation is essential to any data-driven organization. Increasingly, organizations are adopting new solutions to increase the accessibility of data preparation (and reduce the time involved) in a governed, secure manner—no longer is data preparation considered a job for IT or highly-skilled technical teams, but rather one that spans a variety of different users, in particular the data analysts who know the data best.
Given that data preparation is not only a relatively new technology, but also a new process for many organizations,

This blog is the first in a series of three in which we walk you through a recent project we did on predicting house prices. In this case study we will discuss the entire process: from model development to implementation. Importantly, although we use house-price prediction as an example, the take-away messages will be relevant for any machine learning project, irrespective of the domain.

Predicting house prices is a well-known data science problem. Not only is it one of the ‘get started’ cases on the competition website Kaggle, it is also a popular use case for blogs on machine learning techniques.

Society loves heroes. Unfortunately, teams can rarely be heroes. Complex stories of discovery and innovation involving teams of hundreds of collaborators all working together are ultimately distilled down to one or two figureheads who become the public face and “heroes” responsible for the team’s success. This has become increasingly true in the world of data science, in which collaborative interdisciplinary teams are the norm, yet the accolades and adulation are typically heaped on single individuals arbitrarily chosen by the press and outside public to represent those teams. What will it take for the public and press to begin seeing data science as the teamwork it typically is?

