Machine learning and the Pythonic buzz
A beginner-friendly introduction to machine learning with Python and scikit-learn, building a first model on the Boston housing dataset.
Introduction
“Predicting the future isn’t magic, it’s artificial intelligence” ~ Dave Waters
Machine learning represents one of artificial intelligence’s most compelling applications.
So what is machine learning?
Machine learning applies statistical modeling to identify patterns in data, enabling predictions on previously unseen information.
Consider a business scenario: you want to forecast quarterly profits across multiple products. While manual pattern recognition works for small datasets, statistical approaches scale better. Machine learning automates these statistical processes across hundreds or thousands of products and millions of observations, delivering reliable predictions when provided quality data.
Where does Python come in here?
Python excels at rapid prototyping. Libraries like matplotlib, scikit-learn, numpy, pandas, and scipy make “data visualization, large-scale computation, data manipulation and machine learning accessible like never before.” Combined with Jupyter notebooks, Python offers an accessible machine learning workflow.
Talk is cheap. Show me the code.
This tutorial uses scikit-learn and the Boston housing dataset to predict house prices.
Step 0: Installation
Download Python from python.org. Install dependencies via:
pip install numpy scikit-learnSteps 1–6: Building your model
from sklearn.tree import DecisionTreeRegressor as dtrimport numpy as npfrom sklearn.datasets import load_boston
data = load_boston()model = dtr()model.fit(data.data[:400], data.target[:400])predictions = model.predict(data.data[400:])results = np.isclose(predictions, data.target[400:], atol=5, equal_nan=True)With a ±5 threshold, most predictions match actual values accurately.
Bonus: visualizing the model
The decision tree can be visualized using WebGraphViz for deeper model understanding.
Conclusion
You’ve built your first machine learning model! Explore concepts like underfitting and overfitting next.