Visualizing categorical data with Python visualizations: Grouped bar graphs and categorical data examples

One of the most basic yet powerful tools in data analysis is categorical data visualization. Visualizing data by groups such as gender, marital status, age range, etc. can help you intuitively understand distributions and differences, as shown in the figure below.

범주형 데이터 예시 그래프 시각화
(Graph visualization of categorical data examples)

In this post, we'll use the PythonHow to group categorical data and visualize it in bar graphs with matplotlib in Pythonfor categorical data. Additionally, we'll look at categorical data examples to give you an idea of what you might be dealing with in real life.

What is categorical data?

Categorical Data is a data type that uses the Data that is separated into specific groups or categoriesin the file.
Here's a typical example

  1. Gender: male, female
  2. Age range: Teens, 20s, 30s, 40s and older
  3. Marital status: Single, Married, Divorced, No Response
  4. Region: Seoul, Busan, Daegu, etc.
  5. Education Level: High School, College, Master's, Doctorate

Analyzing this data can help you determine the Difference, Distribution, Ratiois intuitively understandable.

Categorical data examples

In this example, we'll use the Marital statusand Gender Use the data to visualize the distribution of each group. Imagine that you are given the following categorical data

GenderMarital statusNumber of people
男性Unresponsive700
女性Unresponsive1000
男性Divorce200
女性Divorce250
男性Marriage3200
女性Marriage2400
  • Gender: Separate into Male and Female
  • Marital status: No Response, Divorced, Married

You can save this data to the Bar graphs by groupto visualize it.

Python code examples

Below is code to visualize categorical data based on the data above.

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

Create # data
data = {'Gender': ['Male', 'Female', 'Male', 'Female', 'Male', 'Female', 'Male', 'Female'],
        'Marital_Status': ['No Response', 'No Response', 'Divorce', 'Divorce', 'Married', 'Married'],
        'Count': [700, 1000, 200, 250, 3200, 2400]}

Create a # dataframe
df = pd.DataFrame(data)

# Split data by marital status
no_response = df[df['Marital_Status'] == 'No Response']
divorce = df[df['Marital_Status'] == 'Divorce']
married = df[df['Marital_Status'] == 'Married']

Graph #
fig, axes = plt.subplots(1, 3, figsize=(12, 5), sharey=True)

# First graph: marital status = non-response
axes[0].bar(no_response['Gender'], no_response['Count'], color=['#1f77b4', '#aec7e8'])
axes[0].set_title("Marital Status = No Response")
axes[0].set_xlabel("Gender")
axes[0].set_ylabel("Count")

# Second graph: Marital Status = Divorced
axes[1].bar(divorce['Gender'], divorce['Count'], color=['#1f77b4', '#aec7e8'])
axes[1].set_title("Marital Status = Divorce")
axes[1].set_xlabel("Gender")

# Third graph: Marital Status = Married
axes[2].bar(married['Gender'], married['Count'], color=['#1f77b4', '#aec7e8'])
axes[2].set_title("Marital Status = Married")
axes[2].set_xlabel("Gender")

# common settings
for ax in axes:
    ax.set_xticks(np.arrange(len(df['Gender'].unique())))
    ax.set_xticklabels(['Male', 'Female'], fontsize=10)
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)

Adjust the # layout
plt.tight_layout()

Display the # graph
plt.show()
범주형 데이터 시각화 단계 그림

Code briefs

  1. Generate data
    • Set up your gender, marital status, and headcount data as dictionaries and use the pandas Create it as a DataFrame.
  2. Grouping data
    • No Response, Divorce, Married Filter data by marital status.
  3. Create a graph
    • Three subplotsand give each subplot a bar()to draw a bar graph.
    • Shows gender on the x-axis and number of people on the y-axis.
  4. Layout settings
    • tight_layout()to adjust the spacing between subplots and organize them neatly.

Tips for working with categorical data

Categorical data is stored in Business, statistical analysis, surveys, and morefor example. Let's take a closer look at the examples below to see how categorical data can be used to gain meaningful insights in each of these areas.

1. derive business insights

Product purchase dataI Customer behavior datacan help you understand the characteristics of different customer groups and shape your business strategy.

Use cases

  • Analyze purchase rates by gender and age group
    • For example, visualize the percentage of purchases of a particular product by gender (male, female) and age group (20s, 30s, 40s, etc.).
    • As a result, you can see which groups are making the most purchases so that you can create a targeted marketing strategy.
  • Analyze revenue by region
    • To visualize sales data for a specific product or service by region, you can use the Localized selling strategieson the server.
    • Example: If sales of a particular product are high in the Seoul area, you focus your ads on that region.

2. sociodemographic analysis

Demographic datacan help us understand social characteristics and change, especially marital status, education level, and regional distribution, which are useful data for policy making and research.

Use cases

  • Marital status analysis
    • Visualize the percentage of married, divorced, and single people by age group or region to understand social changes in specific groups.
    • Example: If a region has a high divorce rate, you might need a welfare policy for that region.
  • Analyze income distribution by education level
    • Visualize income levels by education level (high school, college, master's, etc.) How education affects incometo analyze the

3. Analyze survey results

Survey datacan help you get a clearer picture of a particular group's preferences or opinions.

Use cases

  • Product satisfaction surveys
    • Analyze product satisfaction by gender and age group to gather feedback from specific groups.
    • Example: If a group of men in their 30s are less satisfied with your product, you need to make product improvements based on their feedback.
  • Analyze consumer preferences
    • Analyze preferences for a specific product or service by category (region, gender, etc.).
    • Example: If customers in Region A are more satisfied with a particular service than customers in Region B, focus your marketing on that region.

Finalize

In this post, we'll use the How to visualize categorical datain the previous section. matplotlibmakes it easy to visualize a variety of categorical data in a bar graph. Analyze with real data and try different grouping conditions. Visualization will become a powerful tool for interpreting data and gaining insights!

If you're interested in visualizing Python, How to draw and nest doughnut charts with Python Visualization Check out the post and build your knowledge!

# Code Explained in Detail

1. import the library

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np  
  • matplotlib.pyplotA library for graphing and visualization.
  • pandas: A library that allows you to create dataframes and manipulate data.
  • numpyA library that deals with numerical operations and array data.

2. generate data

data = {'Gender': ['Male', 'Female', 'Male', 'Female', 'Male', 'Female', 'Male', 'Female'],
        'Marital_Status': ['No Response', 'No Response', 'Divorce', 'Divorce', 'Married', 'Married'],
        'Count': [700, 1000, 200, 250, 3200, 2400]}  
  • data Dictionaries: Create a dictionary of the data you want to visualize.
    • Gender: represents gender, with two groups, 'Male' and 'Female'.
    • Marital_StatusSet marital status to categorical data: (No Response, Divorced, Married)
    • Count: Indicates the number of people in each group.

3. Create a dataframe

df = pd.DataFrame(data)  
  • pd.DataFrame(): The data dictionary to DataFrame form.
  • The result is data in the form of a table.

Example output:

GenderMarital_StatusCount
MaleNo Response700
FemaleNo Response1000
MaleDivorce200
FemaleDivorce250
MaleMarried3200
FemaleMarried2400

4. Split data by marital status

no_response = df[df['Marital_Status'] == 'No Response']
divorce = df[df['Marital_Status'] == 'Divorce']
married = df[df['Marital_Status'] == 'Married']  
  • Filter by condition: DataFrame's Marital_Status Split data based on a column.
    • df[df['Marital_Status'] == 'No Response']: Extract only rows with a marital status of 'No Response'.
    • df[df['Marital_Status'] == 'Divorce']: Extract only rows with marital status 'Divorce'.
    • df[df['Marital_Status'] == 'Married']: Extract only rows with marital status 'Married'.

5. Create a subplot

fig, axes = plt.subplots(1, 3, figsize=(12, 5), sharey=True)  
  • plt.subplots(): Enables multiple graphs (subplots) to be plotted simultaneously.
    • 1, 3: Create a subplot of row 1, column 3.
    • figsize=(12, 5)Sets the horizontal and vertical size of the entire graph.
    • sharey=True: Sets the y-axis values to be shared across all subplots.

6. first graph: No Response

axes[0].bar(no_response['Gender'], no_response['Count'], color=['#1f77b4', '#aec7e8'])
axes[0].set_title("Marital Status = No Response")
axes[0].set_xlabel("Gender")
axes[0].set_ylabel("Count")  
  • axes[0].bar(): Draw a bar graph in the first subplot.
    • no_response['Gender']: Set the gender ('Male', 'Female') as the X-axis value.
    • no_response['Count']: Sets the number of people of each gender as the Y-axis value.
    • color: Sets the color of the bar.
      • '#1f77b4': dark blue (male)
      • '#aec7e8': light blue (female)
  • set_title(): Sets the title of the subplot.
  • set_xlabel(): Sets the X-axis label.
  • set_ylabel(): Sets the Y-axis label.

7. Second graph: Divorce

axes[1].bar(divorce['Gender'], divorce['Count'], color=['#1f77b4', '#aec7e8'])
axes[1].set_title("Marital Status = Divorce")
axes[1].set_xlabel("Gender")  
  • axes[1]: The second subplot.
  • Data: Visualize the gender and number of people in a group whose marital status is 'Divorce'.
  • Setting the X- and Y-Axes: Same as the first graph.

8. Third graph: Married

axes[2].bar(married['Gender'], married['Count'], color=['#1f77b4', '#aec7e8'])
axes[2].set_title("Marital Status = Married")
axes[2].set_xlabel("Gender")  
  • axes[2]: The third subplot.
  • Data: Visualize the gender and number of people in a group whose marital status is 'Married'.

9. common settings

for ax in axes:
    ax.set_xticks(np.arrange(len(df['Gender'].unique())))
    ax.set_xticklabels(['Male', 'Female'], fontsize=10)
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)  
  • for ax in axes: Apply common settings to each subplot.
  • Setting the X-axis scale:
    • np.array(len(df['Gender'].unique())): Set the scale to 2 since there are 2 genders ('Male', 'Female').
    • set_xticklabels(): Show the X-axis labels as 'Male' and 'Female'.
  • Remove borders:
    • spines['top'].set_visible(False): Removes the top border.
    • spines['right'].set_visible(False): Removes the right border.

10. Layout and graph output

plt.tight_layout()
plt.show()  
  • tight_layout(): Automatically adjusts the spacing between subplots to avoid overlapping graphs.
  • plt.show(): Prints the graph to the screen.

Similar Posts