Python for Data Science and Machine Learning

This document provides basic tools and algorithms for use with data science and machine learning tasks in Python

Andrew L. Mackey


Python is a fairly simple language to use for general programming. It has a rich set of libraries for performing a variety of tasks with respect to data science and machine learning. As a result, this makes it a fairly useful language for machine learning.


Splitting Data for Training and Testing

The following will split the X and y variables into training and testing splits of 80% and 20%, respectively.

from sklearn.model_selection import train_test_split
import pandas as pd

X = pd.read_csv("/path/to/xdata")
y = pd.read_csv("/path/to/ydata")

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,random_state=42)


Support Vector Machine (SVM) Classifier

Support Vector Machine (SVM) classifiers provide reasonable accuracy for datasets where there is a clear margine of separation within the data.

Hyperparameter Options
from sklearn import svm
from sklearn import metrics

model  = svm.SVC(kernel='linear'), y_train)
y_pred = model.predict(X_test)

print( "Accuracy:"  , metrics.accuracy_score(  y_test, y_pred )  )
print( "Precision:" , metrics.precision_score( y_test, y_pred )  )
print( "Recall:"    , metrics.recall_score(    y_test, y_pred )  )