Prototyping Tools for Interactive Music in VR and 360 Audio

This page is an archive of my presentation at the Game Sound Conference (research track), Biltmore Hotel Los Angeles, Nov 8th, 2016.

The presentation was a sequel to my paper, Knots in the String of Time: Adapting Computer Game Prototyping Tools for Interactive Music (Proquest)  (ACM), which posed several questions:

This paper documents an effort to adapt computer game prototyping tools as controls for interactive digital music. Within that effort the social aspects of interactive music were explored. Is it possible give the game-player-listeners the ability to structure the music interactively and become the composers of their own
game experience?

A second point of inspiration came from the contemporary conceptual
artist Joseph Beuys:

Every human being is endowed with creativity, an inborn,
universally distributed faculty which lies fallow in most
people. The social task of professional artists is to liberate
this repressed creative potential until all human labor
deserves to be called artistic. Essential to his belief in
creativity is that it is future-oriented: it has the performative
structure of a promise.

I am a professional artist; thus, Beuys has charged me and others
like me with a ‘social task.’ Could I build an interactive
environment that would allow a non-professional to structure
time, to sequence affect, to create cause-and-effect (i.e, musical
meaning) through the use of harmonic language, rhythm, and
other musical elements, and do what composers do?

Here are the slides, videos, and script from the presentation.

Note: the navigation controls on the bottom right of the following slides are non-functional in this context.  However, the videos display the familiar transport controls: arrows, pause/play button, etc.

Hello. Welcome to Prototyping Tools for Interactive Music. My name is Nick Venden.  I’m an independent developer, composer, sound designer, and coordinator for AudioArtsCollective—specialists in interaction design for music in VR.  I created five working HTC Vive demos of the techniques I want to share with you.  Here are recordings taken directly from the Vive-SteamVR monitor and headphone feed—no post production other than timeline splices and a few crossfades.  The demos also showcase 360 spatialized audio, which cannot be heard here.

I’ll  only have time to discuss one of these in depth–A* Pathfinding Project by Aron Granberg. However, the setup for this project includes:

Everything I’m showing today relies heavily on the work of these expert programmers. Here are four slides showing how to set up Newton and Rapture3d.

So, pathfinding!

What you see above is a 3×3 meter roomscale Vive setup.  Hovering in the space, at waist height, is a horizontal plane — a very large circuitboard.  The user has an invisible collider attached to her head and holds the two controllers.   She can stand and intersect with the circuitboard.  There is a pathfinding AI agent in the space and a white sphere or target for the A* pathfinding.

There are many types of splines and graphs for pathfinding–both 2D and 3D. Granberg’s A* project offers several of these.

For this demonstration I’m going to create a gridGraph similar to the one above left.

Here’s the process:  create an A-star manager game object and an AI seeker game object in the hierarchy.

Here they are in the inspector:

Create a horizontal groundplane with the tag, “ground.” (Here you’re looking down on the Vive’s walkable area in my studio.)

Place cubic obstacles on the “ground” with the tag,  “obstacle.” At this point configure the obstacles to account for Geometric Acoustics, Audio Occlusion, etc.

Scan  from above to identify the “obstacles”  as unwalkable areas.

Here’s a side-view:

Assign the circuit board material with transparent  shader to the “ground.”

Replace the obstacles with ICs, lights, and line renderers.

Create a seeker with Granberg’s seeker script, AiPath script, etc.

Create a target component (the little white ball) with two NVR scripts, a collider, and a rigid body.

Drag this target gameObject onto the “target” field of the seeker’s AIPath script.

The target is made interactable by adding two NVR scripts.  NVRInteractableItem.cs (from Newton Physics)  NVRAttachPoint.cs (from Newton Physics).

I also built an interactive slider which controls the speed of the seeker–a type of tempo control.

I discovered that if it’s  fast, the seeker avoided the obstacles completely; if I slowed it down, the seeker hopped over the obstacles (the Integrated Circuits). Hmm. Useful. So . . .

In Logic ProX, I created some “hopping up and hopping down” audio clips, which I could synchronize to  OnCollisionEnter and OnCollisionExit methods.

Now, so far, a few visual objects, audio clips, collision logic, and scripts for pathFinding and Newton physics, are all bound together into one interactive musical object.

Then after a few design iterations, a simple tension-release harmonic scheme developed.

Twenty four monophonic stems were bounced out of LogicProX then used in 24 Unity3D AudioSources.  BlueRipple’s R3dAudioSource component can create tightly constrained “audio Zones” if you use Linear Rolloff for the Distance Model and very small Min/Max Distances  of  0.1 and  .5.

I arranged pairs of these audio zones  in adjacent areas of  the grid — areas of “harmonic tension and release.”  (Represented above by the fuzzy spheres.)  I raised the audio zones to head-height.  I put trigger boundaries  around the audio zones and a collider on the Vive headset.  So now, the user can walk through areas with  harmonic tension into areas with gentle resolution — the second chord of the pair.

The Vive controller’s changing audio was created with this routine:


Everything was beautifully spatialized to 3rd order resolution by Richard Furse’s Rapture VR for Unity.

“Oh look!”  The seeker is pulsing.

It must move every time a valid path is sent from the  Astar manager.  Maybe I can  . . . yes!  assign a percussive click sound to the start of each path.  There must be a public variable/threshold for how often a path is calculated.    Ohwowee!  I can give the target component a constant force in a horizontal x-direction, and the seeker will create a percussion track.  Owh!  I can vary the tempo by changing the constant force from the Unity Physics engine.  If I make this precise enough, it could be a percussion track!  I can make beats!

“Oh look!” Just before the seeker object starts on a new assigned A* path, it rotates abruptly.  How can I synchronize a sound to that rotation?  In Granberg’s API I found that a delelgate callback is sent from the AIPath manager script-component to the Seeker script-component each time a valid path is created.  There’s a listener over there!  C# delegates are multi-cast so . . .

. . . it’s possible for me register additional custom musical functions to the Seeker’s pathCallback;

This is exciting! This, this is exhausting. But there’s a problem here:

This begins to resemble “Generative Art.” It’s tempting to create autonomous objects, synchronize those objects with Unity3d processes, sprinkle-in some random numbers to make it appear “organic,”  then let this beautiful little machine run all by itself.  Fascinating, but Autonomous GenerativeArt Systems rob the VR user of any agency; the user becomes a passive observer again, and the User Experience is destroyed.

But!…if  you enable the user to chose from dozens of interRelated generative art systems–to learn where they in the space and what they do — you give control back to the user and make the user a creative participant again. Can we enable the user to become a musician or composer?

Let’s look at this question . . .
If music composition can be reduced to two simple, inter-related skills —

If these two things are true, this little GridGraph example is a music composition machine. Because . . .“One”  The user can choose to walk in and out of areas of harmonic tension and release — think of this as sequencing affect and emotion.  And . . .“Two”  The user can force the seeker into collisions which trigger pairs of percussive events — think of this as “creating musical meaning by re-structuring time into cause-and-effect.”

This is simple  . . . and valid.  But, it would take a lot more complexity and thought before this little GridGraph SoundMachine produces compelling music. Plus, this Circuit Board example requires sooo much logic and computation. There should be a less expensive solution.

Beside sophisticated Distance Models, Reverb Algorithms, and scriptable Buffer Streamed Readers, BlueRipple offers the ability to import a complete 3D audio scene encoded into a mutichannel file.  Richard Furse writes,“With Rapture 3D, you can actually place the user in the overall game scene, with an extent, in a similar way to  sources.  The user can walk around and through them!” Here’s the workflow which is almost identical to the workflow for 360 video.

To prepare 16 channel beds in Reaper . . .

1- Upsample existing B-format recordings to 16 channels with BRS Harpex  (there are upsamplers to bring just about any audio channel-format, 5.1, quad, stereo, etc up to 16 channels)
2- (and/or) Place discreet monophonic audio clips into the 16 channel 360 spatial field.
3- On the timeline, reposition the audio objects dynami cally using Reaper’s precise keyframes. Optionally, do this to sync with a 360 video — perhaps something   created in AfterEffects + Mettle by Chris Bobotis.
4- Bounce it out to a 16-channel .wav file.
5- Import into Unity project.
6- Use this multi-channel .wav file as the AudioClip in RaptureAudioBed (script) in the inspector.

The entire bed or scene can be rotated on three axees, as in HMD tracking, and transformed.  But if you don’t want this behavior, you can enable the FixedAtHead field boolean.  Of course, all this is possible with discreet AudioSources parented to moving objects.  However, because only ONE multichannel audio file has to be rendered by the Unity audio engine,  the savings is significant. I learned yesterday that WWise, in its latest release, has an ambisonic channel with elegant solutions for B-format, 1st to 3rd order ambisonics, realtime upsampling, etc.

Some implications of all this: The interaction design can be played out within a larger precomposed spatialized field.   For example, a Virtual Gamelan instrument can be played while surrounded by the  ancient Balinese Monkey chant  — the Kechak.  If there’s time, I’ll show you my velocity-sensitive, multi-sample-layer, Virtual Indonesian Gamelan.

Here’s a quick example; If an entire audio clip can be held in working memory,  a few DSP-type effects can be created without buffer-error artifacts — radical doppler shifting for instance.   Rapture for Unity is capable of this, and its AudioSource has a scripting property named timeSample. This property enables you to get and set the current playback point in the current audio input.  This means you can re-set playback points while radically shifting pitches.  Then for example, if the controls for these two processes are derived from the user’s movements, they become interfaces to a Computer Music generator.  So, while the left hand controller is re-setting the playback point with whole number integers, the right hand is changing the audioSource components’s dopplerFactor.  Someone out there should build this.

This next working Vive demo also requires a 3×3 meter room space.[video__as STILL] Here’s an example of spatialized, doppler-shifted audio effects for a moving rocket.  Suppose for a minute there are two presidential candidates who need to practice acting presidential, who need to practice firing rockets into North Korea for instance–a basic American presidential skill.  Here are the candidates in a cage together. Let see what happens.

The AudioSources with the voices are attached to the two moving balls causing the panning and doppler shift.

You’re wondering about precision and synchronization?  Here’s a list in descending order of precision.

I’ll skip over most of this. There are 11 things you can access on this contoller.SteamVR_TrackedController.CS  (11)But even without using the Vive controllers you can find other controls for interaction design . . . AudioSettings.dspTime void AudioSource.PlayScheduled(double time); AudioClip.length  Collision class (7) UnityAction (eventSystem/callbacks) I’ll give an example of this later in the presentation. (8) The Animation component.

About animation:  Unity AudioSource component exposes 14 members to Mecanim for animation.  Here is an animation of the doppler factor.

Here are 11 useable members from the Vive controller, and the 7 from the Collision (C# class).

AudioClip.length can be used as a precise start time for a fade (decrescendo). This is my coroutine for the fades in the video which follows.

OK, so we have controllers creating pitch, colliders triggering audio zones, controls derived from audioClip.length, and we took a detour into ambisonic workflow. Now, let’s build two velocity sensitive musical devices, one with velocity-triggered loudness and another with velocity-triggered sample layers.

In order to build velocity sensitive instruments, you need to use this specific syntax. To create variable loudness control you need to use  velocity.relativeMagnitude.sqrMagnitude from the Unity physics engine.  This  code snippet, uses nested if else conditional syntax to compare values, instead of the preferred switch case statement.This script will trigger 8 levels of loudness.  Actually, after playing with this, only 3 or 4 levels are needed.

Here’s a demonstration using Syrian musical instruments —  Kanoun (zither,)  Nay (flute), Tar (small plucked instrument), and female voice.

The Voice, the Nay, and the Tar, are from EastWest’s “SilkRoad” sample library.  It’s very resonsive to play.  The delay you see is a result of gaps at the start of some samples — sloppy sample-preparation at EastWest — not the result of the computer calculations. The Kanoun sound is an Arcton instrument from the Kanghaus sample library;  for each virtual string, only one sample is retuned to an arpeggio.   Instead of an exotic microtuning—and to accomodate the pitches in the vocal samples—I  used one of Olivier Messiaen’s TetraChords  (E, F, G#,A#  interlocked with A#, B, D, E).  I played all of the violin notes into Logic-X using the tetra chords.

As before, this will compute faster if relativeVelocity.sqrmagnitude is used instead of the given .magnitude — a much slower square root call.


So, the object is struck, the sqrmagnitude is compared to a threshold trigger, and the corresponding component’s  audioSource is played.  This is the simplest of many ways to accomplish this.  Of course stabilizing the beaters and adding haptic feedback would improve things.

Mostly it needs a moist jungle and a Monkey God.

Here, I tuned the Gamelan to the Indonesian Slendro scale, which has steps of 720 cents instead of the Western 100 cents–giving 5 divisions of an octave instead of 12.
For this one-octave gamelan, the pitch ratios are:  1.01, 148698354997035, 1.3195079107728942, 1.515716566510398, 1.7411011265922482, and 2.0. This division of an octave is called a 5-TET.   A traditional Thai Mendinka balafon is tuned to 7-TET scale, which has steps of 686 cents. There are dozens of tuning schemes in the world, and there are a half-dozen historic tuning scales in Western Music.  What does this mean for interaction design? It is possible to switch tuning scales by enabling and disabling various MonoBehavior scripts with different conditional statements.  “Just strike the air here and switch to a different exotic world.”

Take theVive controller’s vertical range  (0.1 – 2.0), multiply it by 100, then use it as an argument for Unity’s Mathf.RoundToInt(ViveRange. The result is a useable range of whole integers.  Then evaluate those changing integers (10-200) with a SwitchCase statement.   This will divide the Vive controller’s numeric vertical range into 12 equal vertical spaces in the air.  For each switched case,  a floating point ratio is assigned to the newTargetPitch. The SwitchCase statement look like this.

Case 20: { newTargetPitch = 1.000000 }
Case 30:  //up one half step{ newTargetPitch = 1.059463}
Case 40:  //up one whole step{ newTargetPitch = 1.122462}
Case 50:  //up one minor third{ newTargetPitch = 1.189207} . etc. This division of an octave is called a 12-TET.

Then for example, sweeping the Vive controller up from the floor will consequitively multiply the AudioSource.pitch by all 12 ratios—creating  a rising chromatic octave.

In this project, the Vive controller’s horizontal (x)range and the front-back(z) range is about -1.5 meters  by 1.5 meters.  Alter the same C# script a bit, and you can divide the horizontal space or the Zplane into 12 pitches.  So for example the left controller could alter a lowPassFilter cutoff frequency or resonanceQ while the right controller is altering pitch.   The implications are fun:   Make these divisions of the air a child of  a dynamically moving parent-transform, or create a complex space full of musical elements or spoken word, then hand this entire 3 dimensional grid-array over to a choreographer or performance artist.

We all did this kind of thing years ago with the Leap motion and the MicroSoft kinect  sensors, but now all this can be more accurate because of the Lighthouse.
The Lighthouse tracking is about RMS 1.5mm, the accuracy is about RMS 1.9 mm, and the jitter is about 0.3—in the sub-millimeter range.   Consequently the Vive controllers feel very responsive–if the computations are optimized.

I see I’m out of time.  Questions?