Popular Tags

Matplotlib: Double Bar Graph

In this tutorial, we’ll create a static grouped bar chart with multiple (double) bars with the help of Python libraries: Pandas, Matplotlib, and Seaborn.

Matplotlib: Double Bar Graph

Contents

  1. Prerequisites
  2. Getting Started
  3. Data Preparation
  4. Plotting

Prerequisites

To create a bar plot, we’ll need the following:

  • Python installed on your machine
  • Pip: package management system (it comes with Python)
  • Jupyter Notebook: an online editor for data visualization
  • Pandas: a library to create data frames from data sets and prepare data for plotting\
  • Numpy: a library for multi-dimensional arrays
  • Matplotlib: a plotting library
  • Seaborn: a plotting library (we’ll only use part of its functionally to add a gray grid to the plot and get rid of borders)

You can download the latest version of Python for Windows on the official website.

To get other tools, you’ll need to install recommended Scientific Python Distributions. Type this in your terminal:

    
        
pip install numpy scipy matplotlib ipython jupyter pandas sympy nose seaborn
    

Getting Started

Create a folder that will contain your notebook (e.g. “matplotlib-bar-chart”) and open Jupyter Notebook by typing this command in your terminal (don’t forget to change the path):

    
        
cd C:\Users\Shark\Documents\code\matplotlib-bar-chart
py -m notebook
    

This will automatically open the Jupyter home page at http://localhost:8888/tree. Click on the “New” button in the top right corner, select the Python version installed on your machine, and a notebook will open in a new browser window.

In the first line of the notebook, import all the necessary libraries:

    
        
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np
import pandas as pd
import seaborn as sns
sns.set()
%matplotlib notebook
    

You’ll need the last line (%matplotlib notebook) to display plots of code in input cells.

Data Preparation

Let’s create a Matplotlib grouped bar chart that will show the top 10 movies with the highest revenue and their budgets. We’ll plot a bar graph using a .csv file. Download the file named movies_metadata.csv on Kaggle and put it in your “matplotlib-bar-chart” folder.

On the second line in your Jupyter notebook, type this code to read the file:

    
        
df = pd.read_csv('movies_metadata.csv')
    

Next, create a data frame, sort and format values:

    
        
data = pd.DataFrame(df, columns=['title', 'revenue', 'budget'])
data_sorted = data.sort_values(by='revenue', ascending=False)
data_sorted['revenue'] = data_sorted['revenue'] / 1000000
pd.options.display.float_format = '{:,.0f}'.format
data_sorted.set_index('title', inplace=True)
ranking = data_sorted.head(10)
ranking['budget'] = ranking['budget'].astype(float)
ranking['budget'] = ranking['budget'] / 1000000
ranking
    

The output will look like this:

Pandas output

We’ll use this piece of data frame with multiple columns to create our Matplotlib bar chart.

Plotting

We’ll create a Matplotlib chart with double bars in 5 steps. All the code snippets below should be placed inside one cell in your Jupyter Notebook.

Here’s the list of variables that will be used in our code. You can insert your values or names as you like.

    
        
# Variables
first_bar = ranking['revenue']
first_bar_label = 'Revenue, USD million'
first_bar_color = '#32628d'
second_bar = ranking['budget']
second_bar_label = 'Budget, USD million'
second_bar_color = '#cde01d'
labels = ranking.index
width = 0.4  # the width of the bars
plot_title = 'Top 10 movies by revenue, USD million'
title_size = 18
subtitle = 'Source: Kaggle / The Movies Dataset'
filename = 'barh-plot'
    

1. Create subplots

First, sort data for plotting:

    
        
ranking.sort_values(by='revenue', inplace=True, ascending=True)
    

Next, draw a figure with a subplot.

    
        
fig, ax = plt.subplots(figsize=(10,6), facecolor=(.94, .94, .94))
plt.tight_layout()
    

figsize=(10,6) creates a 1000 × 600 px figure.

plt.tight_layout() adjusts subplot params so that subplots are nicely fit in the figure.

2. Create bars

    
        
# Plot double bars
y = np.arange(len(labels))  # Label locations
ax.barh(y + width/2, first_bar, width, label=first_bar_label, color=first_bar_color)
ax.barh(y - width/2, second_bar, width, label=second_bar_label, color=second_bar_color)

# Format ticks
ax.xaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
    

ax.bar() would create vertical bar plots, while ax.barh() would draw horizontal bar plots. We’re using Matplotlib barh, and our chart will have a horizontal layout.

We’re also using set_major_formatter() to format ticks with commas (like 1,500 or 2,000).

3. Set title and subtitle

    
        
# Set title
title = plt.title(plot_title, pad=20, fontsize=title_size)
title.set_position([.35, 1])

# Adjust subplots
plt.subplots_adjust(left=0.35, top=0.9)

# Set subtitle
tform = ax.get_xaxis_transform() 
ann = ax.annotate(subtitle, xy=(5, 1), xycoords=tform, bbox=dict(boxstyle='square, pad=1.5', fc='#f0f0f0', ec='none'))
    

pad=20 sets the title’s padding, and .35 sets its left margin.

subplots_adjust() prevents the title and labels from being cropped.

4. Create bar labels/annotations

Since we’re creating a Python bar graph with labels, we need to define label values and label position. 

To label bars with values, we’ll use the following function:

    
        
# Create labels
rects = ax.patches
for rect in rects:
    # Get X and Y placement of label from rect.
    x_value = rect.get_width()
    y_value = rect.get_y() + rect.get_height() / 2
    space = 5
    ha = 'left'
    if x_value < 0:
        space *= -1
        ha = 'right'
    label = '{:,.0f}'.format(x_value)
    plt.annotate(
        label,                      
        (x_value, y_value),         
        xytext=(space, 0),          
        textcoords='offset points',
        va='center',                
        ha=ha)
    

Finally, we set y-axis labels and the legend:

    
        
# Set y-labels and legend
ax.set_yticklabels(labels)
ax.legend()

# To show each y-label, not just even ones
plt.yticks(np.arange(min(y), max(y)+1, 1.0))
    

5. Save the chart as a picture

    
        
plt.savefig(filename+'.png', facecolor=(.94, .94, .94))
    

You might need to repeat facecolor in savefig(). Otherwise, plt.savefig might ignore it.

That’s it, your Matplotlib double bar chart is ready. You can download the notebook on GitHub to get the full code.


Read also:

→ Matplotlib: Horizontal Bar Chart

→ Matplotlib: Vertical Bar Chart