Python for Data Science and Machine Learning

This document provides basic tools and algorithms for use with data science and machine learning tasks in Python

Andrew L. Mackey

Overview

Python is a fairly simple language to use for general programming. It has a rich set of libraries for performing a variety of tasks with respect to data science and machine learning. As a result, this makes it a fairly useful language for machine learning.

 

Splitting Data for Training and Testing

The following will split the X and y variables into training and testing splits of 80% and 20%, respectively.

from sklearn.model_selection import train_test_split
import pandas as pd

X = pd.read_csv("/path/to/xdata")
y = pd.read_csv("/path/to/ydata")

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,random_state=42)

 

Support Vector Machine (SVM) Classifier

Support Vector Machine (SVM) classifiers provide reasonable accuracy for datasets where there is a clear margine of separation within the data.

Hyperparameter Options
from sklearn import svm
from sklearn import metrics

model  = svm.SVC(kernel='linear')
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print( "Accuracy:"  , metrics.accuracy_score(  y_test, y_pred )  )
print( "Precision:" , metrics.precision_score( y_test, y_pred )  )
print( "Recall:"    , metrics.recall_score(    y_test, y_pred )  )