Visualizing music

Hi, I’m Guillermina, product specialist at Datawrapper, and today I’m combining my love of cello and data vis.

Have you ever wondered what sound looks like? For a long time, I’ve been captivated by the idea of seeing sound. I believe that patterns exist in almost everything we hear — intonation, silences, speed, and volume. But could these patterns be visualized? If so, what would they look like?

After running into Nicholas Rougeux’s visual music scores, I decided to explore this idea further and try to achieve something similar in Datawrapper with one of the most famous pieces ever written for cello.

Here’s a visual representation of Bach’s Cello Suite No. 1 in G major. It has seven movements: Prelude (the most well-known), Allemande, Courante, Sarabande, Menuet I and II, and Gigue. You can listen to the full suite (and Bach’s five other cello suites, too!) interpreted by Pieter Wispelwey here.

How did I make this visualization?

Creating this visualization was a multistep process:

  • First I downloaded a MIDI file of the suite from MuseScore.
  • I then wrote a Python script using pretty_midi to read into the file. For each note, I extracted the start and end times, pitch, and velocity, and exported everything as a CSV. I tried to match the data structure that Nicholas explains here.

Here’s the Python script for reading into the MIDI file.
import pretty_midi import pandas as pd 
# load MIDI file 
midi_data = pretty_midi.PrettyMIDI('BACH_SUITE.mid') 
# get notes 
notes = [] 
for instrument in midi_data.instruments: if not instrument.is_drum: for note in instrument.notes: notes.append([note.start, note.end, note.pitch, note.velocity]) 
# create df 
df = pd.DataFrame(notes, columns=['start_secs', 'end_secs', 'pitch', 'velocity']) 
# calculate start and duration of notes, and transform pitch into exact note 
df['start_ticks'] = (df['start_secs'] * midi_data.resolution).astype(int) df['duration_secs'] = df['end_secs'] - df['start_secs'] df['duration_ticks'] = (df['duration_secs'] * midi_data.resolution).astype(int) df['fullNoteOctave'] = df['pitch'].apply(lambda x: pretty_midi.note_number_to_name(x)) 
# reorder df 
df = df[['start_ticks', 'start_secs', 'duration_ticks', 'duration_secs', 'pitch', 'fullNoteOctave', 'velocity']] 
# save csv 
df.to_csv('BACH_SUITE.csv', index=False)

  • I also wanted to identify each movement within the cello suite. But I couldn’t find a way of doing that with Python... so I did it manually, going into the CSV file and marking the beginning and end of each movement. This involved reading notes from the CSV out loud and having my boyfriend look up that extract in the actual sheet music. Finally, I was able to generate separate files for each of the seven movements.
  • Because I was going to visualize this data as a scatterplot, I needed to calculate coordinates for each note. For the radius of each note, I normalized its pitch value on a 0-10 scale, then added 2 to shift everything outwards and add blank space in the center. For the angle, I found the note's position in the duration of the whole piece. I then converted these to (x,y) coordinates that would work with a Datawrapper scatterplot.

Here’s the Python script for calculating the (x,y) coordinates.
import pandas as pd 
import numpy as np 
files = ['BACH_SUITE.csv', 'PRELUDE.csv', 'ALLEMANDE.csv', 'COURANTE.csv', 'SARABANDE.csv', 'MENUET1.csv', 'MENUET2.csv', 'GIGUE.csv'] 
# load the data 
for i in files: df = pd.read_csv(i) 
# normalize pitch radius 
min_pitch = df['pitch'].min() 
max_pitch = df['pitch'].max() 
df['normalized_pitch'] = ((df['pitch'] - min_pitch) / (max_pitch - min_pitch) * 10) + 2 
# scale to 0-10 and +2 to shift towards outer circle 
# generate angles (total secs mapped to 0 to 2pi) 
max_time = df['start_secs'].max() 
df['angle'] = (df['start_secs'] / max_time) * 2 * np.pi 
# calculate coordinates using normalized_pitch as radius * sine (x) and cosine (y) 
df['x'] = df['normalized_pitch'] * np.sin(df['angle']) 
df['y'] = df['normalized_pitch'] * np.cos(df['angle']) 
csv_path = '{0}_circle.csv'.format(i) 
df.to_csv(csv_path, index=False)

  • I uploaded the data into Datawrapper and also created different colored buttons for each movement of the suite. (If you want to add buttons like these to your Datawrapper visualizations, you can learn how here.)

That’s it for my Weekly Chart! Now I invite you to sit back, relax, and enjoy some Bach. See you next week!