서포트 벡터 머신 SVM 예제 및 시각화하기 - 2차원

For classification problems in machine learning, Support Vector Machines (SVMs) are a popular and powerful algorithm. In particular, they have the advantage of being easy to understand visually by clearly drawing the decision boundary of the data. In this post, we will learn how to use the Python scikit-learn library to walk through an SVM example, and use the results in the Visualizationso you can understand how SVMs work and see how the results are represented visually.

Understanding support vector machines (SVMs)

SVMis a supervised learning algorithm that finds the best boundary (hyperplane) to separate a given set of data into two classes. The idea behind SVM is to generate an optimal classifier by maximizing the distance of the closest data points (support vectors) to the classification boundary.

SVMs are particularly effective when operating in high dimensions, and can even classify non-linear data using a kernel function. In this example, we'll use two-dimensional data to visualize the boundaries along which an SVM classifies the data.

Python code step-by-step

1. import the required libraries

First, load the necessary libraries to implement the SVM and visualize the results.

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

numpyA library for arrays and math calculations.
matplotlib.pyplotA library for graphing and visualization.
sklearn.datasets: A module that provides a simple dataset, in this case the make_classification dataset that we'll use.
train_test_split: A function that separates data for training and testing.
SVC: A class that implements a support vector machine (SVM) classifier.

2. Create sample data

Create simple two-dimensional data and train it with an SVM.

# Generate simple 2D data
X, y = datasets.make_classification(n_samples=100, n_features=2, n_informative=2, n_redundant=0, random_state=42)

Split the # dataset into training and testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

n_samples=100: Generate 100 samples.
n_features=2: Creates two-dimensional data with two features.
n_informative=2: Both features are useful information for classification.

3. Create and train an SVM model

After separating the generated data into training and testing, train the SVM model.

Create a # SVM model
model = SVC(kernel='linear')

Train the # model
model.fit(X_train, y_train)

kernel='linear': Find a straight line separating the two classes using a linear kernel.

Visualize SVM results

Now visualize the classification boundaries of the trained SVM model and the test data. Support Vector Machine Visualizationis very useful for understanding how SVM models classify data.

Functions to visualize # training data and classification boundaries
def plot_decision_boundary(X, y, model):
    Generate a grid for visualizing # boundaries
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),
                         np.arange(y_min, y_max, 0.01))

    Calculate the predicted value at each coordinate using the # model
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    Visualize the # boundary
    plt.contourf(xx, yy, Z, alpha=0.8)

    Visualize # data points
    plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', marker='o', s=50)
    plt.xlim(xx.min(), xx.max())
    plt.ylim(yy.min(), yy.max())
    plt.title("SVM Decision Boundary")
    plt.show()

Visualize the boundary with the # trained model
plot_decision_boundary(X_test, y_test, model)

np.meshgrid: Generates a grid to visualize the boundaries.
model.predictDraws a boundary based on the values predicted by the model for the generated grid.
plt.contourfVisualize classification boundaries based on the prediction results.
plt.scatter: You can visualize real data points to verify that the model has classified them correctly.

When you run the code above, you'll see the classification boundaries of the data as trained by the SVM, and you'll see that the test data is correctly classified within those boundaries.

Support vector machine visualization exampleFacility name

The classification boundary appears as a linear boundary separating the two classes.
Each data point is displayed in a different color based on its class.
The support vectors learned by the model can also be seen in the visualization as data points located close to the boundary.

Full integration code

Below is the complete code for the support vector machine (SVM) model. We've added comments where necessary to explain the steps.

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

# 1. Generate data
# Create 100 samples with 2 features, creating a dataset with well-separated distributions across classes
X, y = datasets.make_classification(n_samples=100, n_features=2, n_informative=2,
                                    n_redundant=0, random_state=42)

# 2. Data Partitioning
# Divide the data into 70% for training and 30% for testing.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# 3. Generate SVM model
Create an SVM model using a # linear kernel
model = SVC(kernel='linear')

# 4. Train the model
Train the SVM model using the # training data
model.fit(X_train, y_train)

# 5. Define a visualization function
Functions to visualize the classification boundaries and data points of the # trained model
def plot_decision_boundary(X, y, model):
    # 5.1 Setting up the grid for boundary visualization
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),
                         np.arange(y_min, y_max, 0.01))

    # 5.2 Compute the value predicted by the model at each point in the grid
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    # 5.3 Drawing classification boundaries based on predicted values
    plt.contourf(xx, yy, Z, alpha=0.8)

    # 5.4 Visualize actual data points
    plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', marker='o', s=50)
    plt.xlim(xx.min(), xx.max())
    plt.ylim(yy.min(), yy.max())
    plt.title("SVM Decision Boundary")
    plt.show()

# 6. Visualize the classification boundary using test data
plot_decision_boundary(X_test, y_test, model)

Code description

Generate data: datasets.make_classification function to generate two-dimensional data so that it can be easily visualized with SVMs.
Partitioning data: train_test_splitto split the data into training (70%) and testing (30%).
Create an SVM model: SVC class's kernel='linear'to generate a linear support vector machine model.
Model training: fit method to train the model based on the training data.
Visualization FunctionsA function for drawing the classification boundaries of the generated model, which generates a grid to visualize the classification boundaries based on the prediction results at each point.
Visualize results: Use test data to verify the classification boundaries learned by the model.

Frequently asked questions (FAQ)

Q1. What is a support vector machine (SVM)?
A1. SVM is a classification algorithm that finds the best boundary separating the given data into two classes. Support vectors are data points that play an important role in defining this boundary.

Q2. kernelWhat is it?
A2. kernelis a function that transforms data so that data with non-linear relationships can be linearly separated. There are many different types, including linear kernels, polynomial kernels, and RBF kernels.

Q3. Is it possible to visualize support vector machines with only two-dimensional data?
A3. While visualization is intuitively possible with two- or three-dimensional data, higher-dimensional data is harder to represent visually. However, SVMs are applicable to higher-dimensional data.

Q4. What are the advantages and disadvantages of SVM?
A4. Advantages include good performance on high-dimensional data and resistance to overfitting. The downside is that training time can be long for large amounts of data.

Clean up support vector machine visualizations

In this post, we'll take a look at Python's scikit-learnusing the Support Vector Machine (SVM)and visualizing the results, we were able to see firsthand how SVMs set classification boundaries on data and how the results are visualized. In machine learning, visualization is a very useful tool for understanding and improving the performance of a model. Try scaling up your SVM model with more complex data and different kernels!

Support Vector Machine SVM Examples and Visualizations - Two-Dimensional

Understanding support vector machines (SVMs)

Python code step-by-step

1. import the required libraries

2. Create sample data

3. Create and train an SVM model

Visualize SVM results

Support vector machine visualization exampleFacility name

Full integration code

Code description

Frequently asked questions (FAQ)

Clean up support vector machine visualizations

AI 전화영어 뤼튼 스피킹, 역할 프롬프트 복붙해서 스피킹 실력 올리는 법 – 초급

AI 코딩이 ‘일회용 소프트웨어’를 만들 때 생기는 문제, 기술부채

OpenCode 사용법 Tutorial: 코드 설명부터 기능 추가까지

The art of "finding the right color" - from extracting colors from images to harmonious color matching

Agentic AI Gemini CLI 입문 가이드: 구글 Gemini를 터미널로 끌어오는 방법

TRAE 사용법: 무료 AI 코딩 IDE + SOLO 에이전트 실전 가이드

Understanding support vector machines (SVMs)

Python code step-by-step

1. import the required libraries

2. Create sample data

3. Create and train an SVM model

Visualize SVM results

Support vector machine visualization exampleFacility name

Full integration code

Code description

Frequently asked questions (FAQ)

Clean up support vector machine visualizations

Similar Posts