Creating Treemaps with Python (feat. GNI Data Analytics)

Many people like to use data visualizations to intuitively understand complex information, and treemaps are a great tool to do just that.

파이썬 이용한 트리맵 그림

In this post, I'll show you how to create a treemap in Python and use it to analyze the Gross National Income (GNI) of countries in 2014.

What is a treemap?

A treemap is a way to visualize hierarchical data in a rectangular shape. Each Size of the rectanglerepresents the value of that data, and Other colorsto separate groups.

Python code description

1. Install and load the required packages

In Python, the pandas, matplotlib, squarifyand rpy2 package. First, install and load the required packages.

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import font_manager, rc
import squarify
import rpy2.robjects as robjects
from rpy2.robjects import pandas2ri

# Setting up a Korean font on MacOS
font_path = '/Library/Fonts/Arial Unicode.ttf' # Change to the actual font file path
font = font_manager.FontProperties(fname=font_path).get_name()
rc('font', family=font)

# Install the required packages (if they need to be installed)
# !pip install pandas matplotlib squarify rpy2

# Install and load R packages using Rpy2
robjects.r('install.packages("treemap", repos="http://cran.us.r-project.org")')
robjects.r('library(treemap)')
  • Pandas: A library that provides high-performance data structures and data analysis tools for data analysis in Python. DataFrame objects make it easy to manipulate and analyze data.
  • Matplotlib: A library that allows you to visualize data in Python in various forms. You can create plots, graphs, histograms, and more.
  • Squarify: A Python library that generates treemaps. It allows you to visualize the relative size of your data in rectangular form.
  • Rpy2: An interface between R and Python that allows you to import datasets from R into Python.

2. Import and preprocess data

rpy2using the R's treemap Packagesto load and preprocess the GNI2014 dataset provided by the GNI. I didn't find anything in the Python package that I could use, so I borrowed the data from the R package.

Converting # R data to pandas DataFrame
pandas2ri.activate()

Load GNI2014 dataset from treemap package in # R
robjects.r('data("GNI2014")')
GNI2014 = robjects.r('as.data.frame(GNI2014)')
GNI2014 = pandas2ri.rpy2py(GNI2014)

Add labels to the top # and top 25% GNI countries
GNI2014['label'] = GNI2014.apply(lambda x: x['country'] if x['GNI'] > GNI2014['GNI'].quantile(0.75) else '', axis=1)

3. create a treemap

Now, the squarifyand matplotlibto generate a treemap.

Create a # Treemap
plt.figure(figsize=(12, 8))
colors = plt.cm.viridis(GNI2014['GNI'] / max(GNI2014['GNI']))
squarify.plot(sizes=GNI2014['GNI'], label=GNI2014['label'], color=colors, alpha=.8)
plt.axis('off')
plt.title('GNI 2014 dataset treemap', fontsize=20)
plt.suptitle('Gross National Income (GNI) comparison by country', fontsize=16)
plt.show()

Interpreting the results

When you run the code above, you'll get a treemap that allows you to intuitively compare the GNI of different countries, as shown at the top of the post. The size of each rectangle represents the GNI of that country, and the color depends on the GNI value.

FAQs

  1. What can I do with Pandas?
    - Pandas uses dataframes to read, write, manipulate, and analyze data. You can easily do things like import a CSV file or select specific columns of data.
  2. What is the difference between Matplotlib and other visualization libraries?
    - Matplotlib is very flexible and can create many different types of plots, but it can be complicated to use for beginners. Seaborn builds on Matplotlib to provide simpler visualizations, and Plotly makes it easy to create interactive graphs.
  3. When is Squarify useful?
    - Squarify helps you create treemaps. Treemaps help you compare data intuitively, and are especially useful for visually clarifying hierarchical data or comparisons.
  4. What does the Treemap library provide?
    - The Treemap library provides datasets for treemap visualization and helps you practice with example data. Here, we used the load_gni2014 function to load 2014 GNI data.
  5. How do I set a different color for each rectangle when creating a treemap?
    - You can use Squarify and Matplotlib together to set colors. You can use a colormap like plt.cm.viridis to color based on data values.
  6. What does the apply function do in data preprocessing?
    - The apply function allows you to apply a function to each row or column of a dataframe. Here it was used to add the country name as a label if each country's GNI was in the top 25%.

Full code and finalization

Finally, let's clean up the entire code.

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import font_manager, rc
import squarify
import rpy2.robjects as robjects
from rpy2.robjects import pandas2ri

# Setting up a Korean font on MacOS
font_path = '/Library/Fonts/Arial Unicode.ttf' # Change to the actual font file path
font = font_manager.FontProperties(fname=font_path).get_name()
rc('font', family=font)

# Install the necessary packages (if they need to be installed)
# !pip install pandas matplotlib squarify rpy2

# Install and load R packages using Rpy2
robjects.r('install.packages("treemap", repos="http://cran.us.r-project.org")')
robjects.r('library(treemap)')

Convert # R data to pandas DataFrame
pandas2ri.activate()

Load the GNI2014 dataset from # R's treemap package
robjects.r('data("GNI2014")')
GNI2014 = robjects.r('as.data.frame(GNI2014)')
GNI2014 = pandas2ri.rpy2py(GNI2014)

Add labels to the top # and top 25% GNI countries
GNI2014['label'] = GNI2014.apply(lambda x: x['country'] if x['GNI'] > GNI2014['GNI'].quantile(0.75) else '', axis=1)

Generate a # treemap
plt.figure(figsize=(12, 8))
colors = plt.cm.viridis(GNI2014['GNI'] / max(GNI2014['GNI']))
squarify.plot(sizes=GNI2014['GNI'], label=GNI2014['label'], color=colors, alpha=.8)
plt.axis('off')
plt.title('GNI 2014 dataset treemap', fontsize=20)
plt.suptitle('Gross National Income (GNI) comparison by country', fontsize=16)
plt.show()

In this post, you learned how to create a treemap using Python. With these visualization techniques, you can analyze and communicate your data more efficiently. Explore more data visualization techniques in the future!

If you'd like to see the process and results of the same visualization using R instead of Python, you can check out the Visualizing treemaps in R: Comparing Gross National Income (GNI) across countries using the GNI 2014 dataset Check out the post.

Similar Posts