This document provides basic tools and algorithms for use with data science and machine learning tasks in Python
Andrew L. Mackey
Python is a fairly simple language to use for general programming. It has a rich set of libraries for performing a variety of tasks with respect to data science and machine learning. As a result, this makes it a fairly useful language for machine learning.
The following will split the X
and y
variables into training and testing splits of 80% and 20%, respectively.
from sklearn.model_selection import train_test_split
import pandas as pd
X = pd.read_csv("/path/to/xdata")
y = pd.read_csv("/path/to/ydata")
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,random_state=42)
Support Vector Machine (SVM) classifiers provide reasonable accuracy for datasets where there is a clear margine of separation within the data.
from sklearn import svm
from sklearn import metrics
model = svm.SVC(kernel='linear')
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print( "Accuracy:" , metrics.accuracy_score( y_test, y_pred ) )
print( "Precision:" , metrics.precision_score( y_test, y_pred ) )
print( "Recall:" , metrics.recall_score( y_test, y_pred ) )