TL;DR You don’t… at least, not all at the same time.

This post discusses some challenges of visualizing large data sets in the browser and presents a walk-through using hierarchical levels of detail to explore time series data using the D3 library.

Select to zoom in, double click to zoom out

A Time of Innocence

These things always start innocently. You need to display a few data points, so you grab a charting library off the shelf, throw your data into it and… it looks great. Your co-workers are impressed and you high-five yourself. Next time there is some data to be plotted you’re on it. Then you have a few more data points - no problem, your charts easily sail through hundreds of data points. Next thing you know you’re experimenting with a few thousand - and that’s where things start going downhill. Your charts are taking longer and longer to render. Your laptop fans start humming louder - an outward sign of the struggle within. And those people you thought were your friends? Now they’re bringing you data sets with hundreds of thousands of points. You scour the Internet for a solution.

“SVG based libraries can’t possibly handle that - maybe you should try canvas.”

In desperation you switch your plotting package to the best canvas based solution you can find, or roll your own. This give you some more breathing room, but you’re still facing longer and longer draw times under an ever increasing load. “Maybe I need WebGL…” you start to wonder. Given the number of questions about this around the Internet, you are not alone.

Overdraw

A few thousand data points can be thrown at almost any visualization library and drawn near instantly on modern hardware. But when I try that with 50 million points Chrome hangs for a bit, then gives up. “Aw Snap”. Firefox accuses me of committing an “allocation size overflow”. At least Microsoft gives it the old college try.

Thanks for that helpful life advice Edge...

Thanks for that helpful life advice Edge...

It’s evident that at some point we need to rethink our approach. Where is that point? To find out we need to answer two questions: “How many elements are there to look at?”, and “How many pixels do we have?”

Let’s assume that we want a line plot to visualize trends in a time-series data set. Further, let’s assume that we have a (pretty nice) hardware setup with an 8K monitor. The resolution of an 8K monitor being 7680 x 4320 means that we have 7680 pixels available to draw a chart across. If we have no margins, no axis, no anti-aliasing, and use only one pixel per element, we physically cannot show more than 7680 columns on this screen.

However, odds are pretty good that if you tried this with 10,000 or even 20,000 points you would get back a fairly reasonable looking graph. What’s happening is that as we collapse the x axis, multiple lines are getting drawn into the same space - each tick on the x axis contains multiple data points with a line stroke going vertically between them.

A line plot of 100 points, and the same plot after being collapsed

A line plot of 100 points, and the same plot after being collapsed

Consider the following sequence of data points: [5, 10, 20, 7]. Plotting these would give you the following lines:

If these were drawn with a compressed x axis such that they were all in the same column of pixels, you would only see a single vertical line of pixel colored between 5 and 20. The line from 20 to 7 would be drawing over the pixels already filled in from the earlier draws. Filling pixels that have already been filled in is called overdraw. Despite having a name which makes you feel like you’re about to be taken advantage of by a financial institution, some overdraw is pretty normal and not a huge deal if kept small. If we’re plotting 20,000 elements on our 8K screen, we’ll have an overdraw of about 4000 lines - which can still be pretty fast. If we try to plot 50 million elements like this, we’ll be drawing 49,984,000 lines that no one will ever see - and it won’t be fast.

So… let’s just not draw those.

Levels of Detail

Besides being a great band name, hierarchical levels of detail is a concept that has been around forever and goes by different names depending on where it is used. Geospatial and computer vision applications routinely use this technique in the form of image pyramids. 3D games and tools call them MIP maps when applied to textures or levels of detail when applied to models. Probably one of the more familiar application of this technique is Google Maps, which is estimated to have a database of imagery in the hundreds of terabytes yet shows us a globe in seconds.

Whatever the application, the concept is the same - to squeeze data into a smaller visual space, we downsample it to a lower resolution and display that instead. This trades some up front pre-processing and additional storage space for faster access and display. If the number of elements is large, then multiple levels of detail may be needed, each providing a higher level summary of the level below it.

The method used for downsampling should be chosen with some care since it determines which aspects of the data set are hidden at higher view levels. For this walk-through, we’re going to use a min/max function for summarizing our time series data: for each window of N elements we take the min and max value for the next summary level. This function preserves the extremities of the data and is close to what we would see if we plotted all the points directly.

For example, consider a data set of 16 elements, with a window size of 4:

|5 2 4 8 1 2 3 7 2 0 9 1 5 2 8 1|  // Level 0: 16 elements
|-------|-------|-------|-------|
|  2   8|1     7|  0 9  |  2 8  |  // Level 1:  8 elements
|---------------|---------------|
|      8 1      |  0 9          |  // Level 2:  4 elements
...

Notice that each level is half the size of the level above it. This behavior holds for any window size greater than 2 and is nice since it is a geometric series which allows us to compute the upper bound for the storage space we’ll need for our data and summary levels:

$$ a+ar+ar^{2}+ar^{3}+ar^{4}+\cdots =\sum _{k=0}^{\infty }ar^{k}={\frac {a}{1-r}} \\ {\text{ for }}|r|<1 $$

In code:

const element_count = 16;
const window_size   = 4;
const r = 1/(window_size/2);
const max_size = Math.floor(element_count/(1 - r));

Data Layout

The data set I’ve chosen for this walk-through is a rendition of Beethoven’s Moonlight Sonata sampled at 192 kHz. Putting aside the fact that 192 kHz is not that useful for fidelity purposes, it does provide us a nice time series data set to use with plenty of points of interest. To keep things simple, I used Audacity to collapse the stereo channels to a mono track and exported it as uncompressed signed bytes - each sample is a single byte in the range (-128 to 127). The result is a base data file, MoonlightSonata.raw which is approximately 60 MB in size (63,897,600 samples).

This has then been pre-processed with a script that builds a set of summary files containing elements that are the min and max of the corresponding range from the previous level. Since we’re using a single byte for each data point, the min and max can also be represented in a byte each. The HLOD files are written out next to the raw data file, and a descriptor file is generated that gives us some important details regarding the contents of the files:

{
    "fileName": "data/MoonlightSonata.raw",
    "nElements": 63897600,
    "fileSize": 63897600,
    "maxElements": 8000,
    "windowSize": 16,
    "lodFiles": [
        {
            "fileName": "data/MoonlightSonata_1.raw",
            "fileSize": 7987200,
            "level": 1,
            "nElements": 3993600
        },
        {
            "fileName": "data/MoonlightSonata_2.raw",
            "fileSize": 499200,
            "level": 2,
            "nElements": 249600
        },
        {
            "fileName": "data/MoonlightSonata_3.raw",
            "fileSize": 31200,
            "level": 3,
            "nElements": 15600
        },
        {
            "fileName": "data/MoonlightSonata_4.raw",
            "fileSize": 1950,
            "level": 4,
            "nElements": 975
        }
    ]
}

descriptor.json

To display a visual profile for the whole data set, we could download and plot any one of these files. The question then, is which one do we choose for our view? To answer this we trade off resolution and fidelity for size. If we show the file from level 3, we’ll have 15600 elements that need to be plotted, probably with some overdraw. If we use the data from level 4, we’ll only be plotting 975 elements, which might be a coarser detail than we might otherwise show. The windowSize and maxElements parameters are how we tune these tradeoffs. The larger the windowSize, the more space we’ll conserve, but the larger the differences between HLOD levels. The maxElements parameter is an upper bound on the maximum number of elements that we want to draw at any given time. Since we’ve chosen maxElements to be 8000, we’ll use HLOD file level 4, with 975 elements.

Visualization

To turn our data into pixels, I’m using the inimitable D3 library. While the concept of using hierarchical levels of detail for different zoom levels could almost certainly be shoe-horned into many different plotting libraries, D3 hits a nice balance in its level of abstraction. It provides the tools we need for placing, rendering, and animating the data and axes while giving us enough low level control to handle the data logistics ourselves.

Setup and Basic Plot

Moonlight Sonata rendered at level 4 (no zooming)
basic_plot(d3.select("#basic-plot"));

function basic_plot(svg) {
  const DESCRIPTOR_FILE = "data/descriptor.json";

  // Standard D3 plot setup with margins for the axes.
  const margin = {
    top: 20,
    right: 20,
    bottom: 20,
    left: 30
  };
  const width = +svg.attr("width") - margin.left - margin.right;
  const height = +svg.attr("height") - margin.top - margin.bottom;
  const g = svg
    .append("g")
    .attr("transform", "translate(" + margin.left + "," + margin.top + ")");

  const Y_DOMAIN = [-128, 127];
  const yScale = d3
    .scaleLinear()
    .domain(Y_DOMAIN)
    .range([height, 0]);

  const X_FULL_DOMAIN = [0, 1];
  const xDataScale = d3
    .scaleLinear()
    .domain(X_FULL_DOMAIN)
    .range([0, width])
    .clamp(true);

  // Setup the axes.
  const xAxis = d3.axisBottom(xDataScale).ticks(10);
  const yAxis = d3.axisLeft(yScale).ticks(4);

  // The charting function
  const area = d3
    .area()
    .y0(d => yScale(d.min))
    .y1(d => yScale(d.max));

  // This is the data descriptor that will be filled in later.
  let dataDescriptor;

  // X-axis
  g.append("g")
    .attr("class", "x-axis")
    .attr("transform", "translate(0," + height + ")")
    .call(xAxis);

  // Y-axis
  g.append("g")
    .attr("class", "y-axis")
    .call(yAxis);

  // Data view
  const gDataView = g.append("g").attr("class", "data-view");

  main();

  // Setup and draw the initial view
  async function main() {
    // First download the descriptor file for our data.
    await fetchDescriptor();

    // Then fetch the data that we want to plot.
    const data = await fetchData();

    // Then plot it
    const xViewScale = d3
      .scaleLinear()
      .domain([0, data.elements.length - 1])
      .range([0, width]);
    area.x((d, i) => xViewScale(i));

    gDataView
      .insert("path")
      .attr("class", "dataView area")
      .attr("d", area(data.elements));
  }

  // Fetch the descriptor file
  async function fetchDescriptor() {
    const response = await fetch(DESCRIPTOR_FILE);
    dataDescriptor = await response.json();
  }

  // Fetch data to be plotted.
  async function fetchData() {
    let level = 4;
    const lodFile = dataDescriptor.lodFiles[level - 1];
    const response = await fetch(lodFile.fileName);

    // Convert the raw byte array back to min/max elements
    const buf = await response.arrayBuffer();
    const view = await new Int8Array(buf);

    const elements = [];
    for (let i = 0; i < view.byteLength - 1; i += 2) {
      elements.push({
        min: view[i],
        max: view[i + 1]
      });
    }

    return { level, elements };
  }
}

basic_plot.js

/* Style definitions for our plots */

svg {
    display: block;
    margin: 0 auto;
}

.area {
    fill: steelblue;
    stroke: steelblue;
    stroke-width: 0.5;
    clip-path: url(#clip);
}

.line {
    fill: none;
    stroke: steelblue;
    stroke-linejoin: round;
    stroke-linecap: round;
    stroke-width: 1.5;
    clip-path: url(#clip);
}

style.css

Before taking a tour through this code, note that we’re using a few modern (in 2018) JavaScript techniques to make our lives easier, such as promises with async/await, and the fetch API for downloading data. These are supported natively in most modern browsers, but you may need to use something more sophisticated such as Babel if you’re targeting older platforms.

Most of the code here is fairly standard D3 chart setup, with a few points of interest. First, you may notice that we’re using an area chart for our visualization instead of the more conventional line chart. The reason for this is that when showing elements from the summary levels we don’t have individual points but the min and max of a range of points. In essence, we’re looking at a low resolution contour of our data set. When the number of elements closely matches the number of pixels we’re using, there is no visual difference between an area chart and line chart; but if there are fewer elements than pixels, the area chart appropriately reflects the resolution of the data we have available.

The chart below allows you to zoom the x axis level 4 data - notice that when fully zoomed out it looks like a closely packed line chart, but upon zooming in the contours of the area chart become apparent.

Zooming, but only changing scale - level 4 LOD

Next let’s review a pair of terms that we will be manipulating quite a bit to explore our data: domain and range. A quick Google of these terms results in a number of different technical and sometimes confusing descriptions for what is a fairly elementary concept: for pure functions, the domain is the set of allowed inputs and the range is the resulting set of outputs. A pure function is one that doesn’t keep state or have side effects and always returns the same value for a given input.

In this application, our use of the terms domain and range refer to a linear mapping of value from the domain to the range. Consider our setup of the y axis scale:

const Y_DOMAIN = [-128, 127];
const yScale = d3.scaleLinear().domain(Y_DOMAIN).range([height, 0]);

Since our each of our data points is encoded as a signed byte (in 2’s complement), it falls in the range -128 to 127. The way we set up the yScale indicates that we want a function that linearly maps data values between -128 and 127 to values between height and 0. Don’t forget that in screen coordinates, the y axis 0 starts in the upper left hand corner, which is why we want the highest data point (127) to be at the top of the chart (y == 0), and the smallest data point (-128) to be at the bottom of the chart (y == height).

Unlike the y axis that doesn’t change over the entirety of our data set, we will need to manipulate the x axis when we start zooming around on our data in the next section. For this, we will start with two different scales along the x axis, the first of which is defined in the plot setup:

const X_FULL_DOMAIN = [0,1];
const xDataScale = d3.scaleLinear().domain(X_FULL_DOMAIN).range([0, width]).clamp(true);

We’re using the xDataScale to keep track of our view within the the data set. Using the domain 0 to 1 is a useful approach in that it allows us to refer to a range of the data set irrespective of the scale. For example, a window of [0, 0.25] covers the first quarter of the data set, while [0.75, 1] is the last quarter of it. In a real plot we could convert this to a descriptive set of labels for the x axis (such as index, date, time, etc.) but for simplicity we’re going to just use this directly as our x axis labels.

The next x scale that we’ll need to handle is the one that maps data elements to their place on the screen after we’ve fetched the data we’d like to render:

const xViewScale = d3.scaleLinear().domain([0,data.elements.length-1]).range([0, width]);

Similar to the y scale, here we define the function that D3 is going to use to layout our data along the x axis. This line indicates that we want every element in our array of data to be distributed between 0 and the width of the plot.

Getting Closer

At this point we have a chart showing the outline of the data set and we can download and draw it relatively fast (it’s less than 2k). But we still want to explore our data, and to do that we’ll need to setup a bit more infrastructure. Building on what we already have, let’s implement zooming:

zoom_plot(d3.select("#zoom-plot"));

function zoom_plot(svg) {
  const DESCRIPTOR_FILE = "data/descriptor.json";

  // Standard D3 plot setup with margins for the axes.
  const margin = { top: 20, right: 20, bottom: 20, left: 30 };
  const width = +svg.attr("width") - margin.left - margin.right;
  const height = +svg.attr("height") - margin.top - margin.bottom;
  const g = svg
    .append("g")
    .attr("transform", "translate(" + margin.left + "," + margin.top + ")");

  const Y_DOMAIN = [-128, 127];
  const yScale = d3
    .scaleLinear()
    .domain(Y_DOMAIN)
    .range([height, 0]);

  const X_FULL_DOMAIN = [0, 1];
  const xDataScale = d3
    .scaleLinear()
    .domain(X_FULL_DOMAIN)
    .range([0, width])
    .clamp(true);

  const xAxis = d3.axisBottom(xDataScale).ticks(10);
  const yAxis = d3.axisLeft(yScale).ticks(4);

  // Area chart function
  const area = d3
    .area()
    .y0(d => yScale(d.min))
    .y1(d => yScale(d.max));

  // Line chart function
  const line = d3.line().y(d => yScale(d));

  // Create a brush for selecting regions to zoom on.
  const brush = d3
    .brushX()
    .extent([[0, 1], [width, height - 1]])
    .on("end", brushEnded);

  let idleTimeout;
  const IDLE_DELAY = 350;
  const MIN_ZOOM_ELEMENTS = 5;

  let dataDescriptor;

  // X-axis
  g.append("g")
    .attr("class", "x-axis")
    .attr("transform", "translate(0," + height + ")")
    .call(xAxis);

  // Y-axis
  g.append("g")
    .attr("class", "y-axis")
    .call(yAxis);

  // Data view
  const gDataView = g.append("g").attr("class", "data-view");

  // Zoom brush
  g.append("g")
    .attr("class", "brush")
    .call(brush);

  main();

  // Setup and draw the initial view
  async function main() {

    // First download the descriptor file for our data.
    await fetchDescriptor();

    // Then fetch the data that we want to plot.
    const data = await fetchData(X_FULL_DOMAIN);

    // Then plot it
    const xViewScale = d3
      .scaleLinear()
      .domain([0, data.elements.length - 1])
      .range([0, width]);

    const pathFunc = getPathFunction(data);
    pathFunc.x((d, i) => xViewScale(i));

    gDataView
      .insert("path")
      .attr("class", getClass(data))
      .attr("d", pathFunc(data.elements));
  }

  // Choose the function to draw, either an area or a line chart, depending on the level
  function getPathFunction(data) {
    return data.level > 0 ? area : line;
  }

  // Choose the CSS class for an area or line chart, depending on the level.
  function getClass(data) {
    return data.level > 0 ? "dataView area" : "dataView line";
  }

  // Handler for the end of a brush event from D3.
  function brushEnded() {
    const s = d3.event.selection;

    // Consume the brush action
    if (s) {
      svg.select(".brush").call(brush.move, null);
    }

    if (s) {
      zoom(s.map(xDataScale.invert, xDataScale));
    } else {
      // Rudimentary double-click detection
      if (!idleTimeout) {
        return (idleTimeout = setTimeout(() => {
          idleTimeout = null;
        }, IDLE_DELAY));
      }

      zoom(X_FULL_DOMAIN);
    }
  }

  // Zoom the view to a given domain within the data domain 0..1
  async function zoom(newDomain) {

    // Check to see if we're trying to go lower than our minimum.
    if (
      newDomain[1] - newDomain[0] <
      MIN_ZOOM_ELEMENTS / dataDescriptor.nElements
    ) {
      console.log("Max Zoom");
      return;
    }

    // Adjust the X scale
    xDataScale.domain(newDomain);

    // Render the axis on the new domain with the transition.
    svg.select(".x-axis").call(xAxis);

    // Remove the old data
    gDataView.select("*").remove();

    // Get the new data
    const data = await fetchData(newDomain);
    const xViewScale = d3
      .scaleLinear()
      .domain([0, data.elements.length - 1])
      .range([0, width]);

    const pathFunc = getPathFunction(data);
    pathFunc.x((d, i) => xViewScale(i));

    // Draw it
    gDataView
      .append("path")
      .attr("class", getClass(data))
      .attr("d", pathFunc(data.elements));
  }

  // Fetch data to be plotted.
  async function fetchData(domain) {
    const level = levelFromDomain(domain);

    let nElements;
    if (level === 0) {
      nElements = dataDescriptor.nElements;
    } else {
      nElements = dataDescriptor.lodFiles[level - 1].nElements;
    }

    // Convert from the domain space 0..1 to actual elements in this scale level
    const elementStart = Math.max(Math.floor(domain[0] * nElements), 0);
    const elementEnd = Math.min(
      Math.ceil(domain[1] * nElements),
      nElements - 1
    );

    if (level > 0) {
      const lodFile = dataDescriptor.lodFiles[level - 1];

      // Determine byte offsets for these elements:
      // Each element is 2 bytes (min, max)
      const ELEMENT_SIZE = 2;

      const rangeStart = elementStart * ELEMENT_SIZE;
      const rangeEnd = elementEnd * ELEMENT_SIZE + ELEMENT_SIZE - 1;

      const view = await fetchByteRange(lodFile.fileName, rangeStart, rangeEnd);
      let elements = [];
      for (let i = 0; i < view.byteLength - 1; i += 2) {
        elements.push({
          min: view[i],
          max: view[i + 1]
        });
      }

      return { domain, level, elements };
    } else {

      // At level 0 we have actual data points (not min/max aggregates)
      const elements = await fetchByteRange(
        dataDescriptor.fileName,
        elementStart,
        elementEnd
      );
      return { domain, level, elements };
    }
  }

  // Determine which level to use for a view, given a domain span.
  function levelFromDomain(domain) {
    const domainSpan = domain[1] - domain[0];

    // Check level 0
    const nElements = Math.ceil(dataDescriptor.nElements * domainSpan);
    if (nElements <= dataDescriptor.maxElements) return 0;

    // Then check the LOD levels.
    let a = Math.log(nElements / dataDescriptor.maxElements);
    let b = Math.log(dataDescriptor.windowSize);
    return Math.ceil(a / b);
  }

  // Fetch a byte range for a file.
  async function fetchByteRange(file, rangeStart, rangeEnd) {
    const headers = { Range: `bytes=${rangeStart}-${rangeEnd}` };
    const response = await fetch(file, { headers });

    const buf = await response.arrayBuffer();
    let byteOffset = 0;
    let length = rangeEnd - rangeStart + 1;

    // If the server sends back the whole file for some reason,
    // then we'll handle it by doing our own offset into it.
    if (response.status === 200) {
      byteOffset = rangeStart;
    }

    const view = await new Int8Array(buf, byteOffset, length);
    return view;
  }

  // Fetch the descriptor file
  async function fetchDescriptor() {
    const response = await fetch(DESCRIPTOR_FILE);
    dataDescriptor = await response.json();
  }
}

zoom_plot.js

This code segment introduces a few new aspects to our approach, the first being that we now have a line chart function hanging around in addition to the area chart function. As we discussed earlier, when viewing summary levels we are looking at min/max elements so an area plot is appropriate. However, when we zoom in far enough that we see individual data points, we’ll want to switch back to a traditional line chart to show the series.

Additionally, we’re using the D3 brush facility which implements a nice selector visual and allows us to get the selected region. The code in brushended:

s.map(xDataScale.invert, xDataScale)

uses the inverse of the data scale and the selected region, s, to turn the pixel coordinates from the brush into a fractional window within our data domain when the user selects an area. When the user double-clicks on the chart area, then we reset the view domain to the full area, 0 to 1.

The code in the zoom function is fairly close to what we already have in main - we setup the view of the x axis, fetch the data, and then display it.

The fetchData function is where things start to get interesting. In the previous version of this function, we just grabbed the full top level file, converted it’s bytes to min/max elements in a data structure and were done. We still need to do that here, but again, we do things a bit differently based on whether we’re looking at zoomed out level, or the actual data points which are not min/max pairs. However, before we get there we need to work out which level we’re going to use.

We can derive an analytic solution using our parameters:

Each level that we move up is a factor of windowSize smaller than the level below it. So the relationship we want is:

$$ m \leq {\frac {N}{w^l}} $$

and solving for l:

$$ l = \lceil \log_{w}{\frac {N}{m}} \rceil $$

While, JavaScript’s math library doesn’t have a log function that takes a base, recall that we can obtain the same thing with division:

$$ {\displaystyle \log _{b}x={\frac {\log _{k}x}{\log _{k}b}}\,} $$

Putting this into code gives us the levelFromDomain function:

function levelFromDomain(domain) {
    const domainSpan = domain[1] - domain[0];

    const nElements = Math.ceil(dataDescriptor.nElements * domainSpan);
    if (nElements <= dataDescriptor.maxElements)
        return 0;

    let a = Math.log(nElements/dataDescriptor.maxElements);
    let b = Math.log(dataDescriptor.windowSize);
    return Math.ceil(a/b);
}

After figuring out which zoom level to use, we’re now ready to go get the data. This is where we get to fully exploit the data layout we’ve chosen. Since our files are simple arrays of data, we can exactly compute the section of the file we need. Further, if our web server supports byte range requests (and most do), then we can request just the bytes we need from the file.

We can now interactively explore our data set, downloading only the pieces we need, using just a standard web server or CDN.

Zooming between levels - instant view change

Because we’ve placed an upper bounds on the number of elements we display at any given view, we also have a bounds on the size of data that needs to be retrieved from the server when zooming. In our example, we’ve set maxElements to 8000, which means that any time the user zooms on a region, we’ll need to download at most a 16k chunk. This lazy loading approach gives our user near instant access to any part of their data, without needing to download the whole set to them first.

Show Your Work

There are many things that young me complained about regarding my elementary education, but at the top of my complaint list would be getting math problems wrong for not showing my work. The reason for this of course, is that sometimes the journey is almost as important as the destination.

At this point we have the fundamentals in place and working that allow us to explore our large data set by streaming it in a chunk at a time, but the experience is a bit jarring. When a user selects a region to zoom into, several things happen:

In the best case, you have a fast pipe to the data and this transition happens almost instantly, in which case you see a flash and are left trying to orient yourself with a new view of data (and a new scale). In the worst case the scale changes immediately, but it takes some time for the new data to show up, so you’re left looking at a blank screen while the data loads. Then when it does arrive, it gets flashed into view. In either scenario, the viewer is left trying to guess what happened. This is where transitions come to our aid. Instead of letting them wonder, we can take a little time and show them what we’re doing.

We’re going to use two transitions to accomplish this. In the first transition, we motion animate zooming the scale and the currently visible data set. This keeps the viewer oriented with both the scale of the zoom as well as where in the data set we’re zooming to. For the second transition, we’re going to cross fade the lower resolution data with the higher resolution data when it arrives from the server. That orients the viewer to the fact that we are giving them a higher resolution view. An additional advantage of using transitions is that they provide us a window to hide the latency it takes to download the new chunk from the server. Ideally, the new chunk can be downloaded before the zoom transition finishes and the cross fade can be scheduled to finish at the same time to give us one smooth transition.

This is an application of our approach, slowed down to emphasize the transition behavior:

Zooming with transitions - slowed 10x

On to the code - much of our setup and data retrieval code remains the same, but the transitions have added enough heft to our zoom logic that it has been split into two functions.

zoom_transition_plot(d3.select("#zoom-slow"), false);
zoom_transition_plot(d3.select("#zoom-fast"));

function zoom_transition_plot(svg, fast_zoom = true) {
  const DESCRIPTOR_FILE = "data/descriptor.json";

  // Standard D3 plot setup with margins for the axes.
  const margin = { top: 20, right: 20, bottom: 20, left: 30 };
  const width = +svg.attr("width") - margin.left - margin.right;
  const height = +svg.attr("height") - margin.top - margin.bottom;
  const g = svg
    .append("g")
    .attr("transform", "translate(" + margin.left + "," + margin.top + ")");

  const Y_DOMAIN = [-128, 127];
  const yScale = d3
    .scaleLinear()
    .domain(Y_DOMAIN)
    .range([height, 0]);

  const X_FULL_DOMAIN = [0, 1];
  const xDataScale = d3
    .scaleLinear()
    .domain(X_FULL_DOMAIN)
    .range([0, width])
    .clamp(true);

  const xAxis = d3.axisBottom(xDataScale).ticks(10);
  const yAxis = d3.axisLeft(yScale).ticks(4);

  // Area chart function
  const area = d3
    .area()
    .y0(d => yScale(d.min))
    .y1(d => yScale(d.max));

  // Line chart function
  const line = d3.line().y(d => yScale(d));

  // Create a brush for selecting regions to zoom on.
  const brush = d3
    .brushX()
    .extent([[0, 1], [width, height - 1]])
    .on("end", brushEnded);

  const ZOOM_TIME = fast_zoom ? 500 : 5000;
  const CROSS_FADE_TIME = 150;
  const TRANSITION_EASE = d3.easeSin;
  const MIN_ZOOM_ELEMENTS = 5;

  let idleTimeout;
  const IDLE_DELAY = 350;

  let dataDescriptor;

  // Track the top level and current data sets
  let topData;
  let currentData;

  // Keep track of our current zooming status.
  let zoomTargetDomain;
  let zoomInProgress;
  const startZoom = () => {
    zoomInProgress = true;
  };
  const endZoom = () => {
    zoomInProgress = false;
  };

  // A clip path is needed to mask the chart under flowing the axes while zooming.
  svg
    .append("defs")
    .append("svg:clipPath")
    .attr("id", "clip")
    .append("svg:rect")
    .attr("width", width)
    .attr("height", height)
    .attr("x", 0)
    .attr("y", 0);

  // X-axis
  g.append("g")
    .attr("class", "x-axis")
    .attr("transform", "translate(0," + height + ")")
    .call(xAxis);

  // Y-axis
  g.append("g")
    .attr("class", "y-axis")
    .call(yAxis);

  // Data view
  const gDataView = g.append("g").attr("class", "data-view");

  // Zoom brush
  g.append("g")
    .attr("class", "brush")
    .call(brush);

  main();

  // Setup and draw the initial view
  async function main() {
    // First download the descriptor file for our data.
    await fetchDescriptor();

    // Then fetch the data that we want to plot.
    currentData = await fetchData(X_FULL_DOMAIN);
    topData = currentData;

    // Then plot it
    const xViewScale = d3
      .scaleLinear()
      .domain([0, currentData.elements.length - 1])
      .range([0, width]);

    gDataView
      .insert("path")
      .attr("class", getClass(currentData))
      .attr("d", drawPath(currentData, xViewScale));
  }

  // Draw a path, either an area or a line chart, depending on the level
  function drawPath(data, scale) {
    const pathFunc = data.level > 0 ? area : line;
    return pathFunc.x((d, i) => scale(i))(data.elements);
  }

  // Choose the CSS class for an area or line chart, depending on the level.
  function getClass(data) {
    return data.level > 0 ? "dataView area" : "dataView line";
  }

  // Handler for the end of a brush event from D3.
  function brushEnded() {
    const s = d3.event.selection;

    // Consume the brush action
    if (s) {
      svg.select(".brush").call(brush.move, null);
    }

    // Lock out interactions while a zoom is in progress.
    if (zoomInProgress) {
      return;
    }

    if (s) {
      zoomIn(s);
    } else {
      // Rudimentary double-click detection
      if (!idleTimeout) {
        return (idleTimeout = setTimeout(() => {
          idleTimeout = null;
        }, IDLE_DELAY));
      }

      zoomOut();
    }
  }

  async function zoomIn(s) {
    // Convert the span from screen coordinates to data space values.
    const newDomain = s.map(xDataScale.invert, xDataScale);

    // Check to see if we're trying to go lower than our minimum.
    if (
      newDomain[1] - newDomain[0] <
      MIN_ZOOM_ELEMENTS / dataDescriptor.nElements
    ) {
      console.log("Max Zoom");
      return;
    }

    zoomTargetDomain = newDomain;

    // Adjust the X scale
    xDataScale.domain(newDomain);

    // Setup a transition for the axis
    const zoomTransition = svg
      .transition("zoomTransition")
      .ease(TRANSITION_EASE)
      .duration(ZOOM_TIME)
      .on("start", startZoom)
      .on("end", endZoom);

    // Render the axis on the new domain with the transition.
    svg
      .select(".x-axis")
      .transition(zoomTransition)
      .call(xAxis);

    const lowResView = gDataView.selectAll(".dataView");
    if (currentData != null) {
      // Work out the new scale for the old data set and start the transition to it.
      const v = rangeFraction(currentData.domain, newDomain);
      const N = currentData.elements.length - 1;
      const newViewDomain = [v[0] * N, v[1] * N];
      const xNewViewScale = d3
        .scaleLinear()
        .domain(newViewDomain)
        .range([0, width]);

      lowResView
        .transition(zoomTransition)
        .attr("d", drawPath(currentData, xNewViewScale));

      // If the zoom is within the same level, then we're done.
      if (currentData.level === levelFromDomain(newDomain)) {
        return;
      }
    }

    // If the zoom was not within the same level, then we're off to grab
    // some higher resolution data.
    const zoomTimeStarted = Date.now();

    let newData;
    try {
      newData = await fetchData(newDomain);

      // ... and we're back! Time to check in on the state of the world.

      // First, check that this data we've gotten back is still what we want.
      // If the network was slow getting this chunk back to us, the user might
      // have already zoomed to some other view.
      if (newDomain !== zoomTargetDomain) {
        return;
      }
    } catch (ex) {
      // If we can't get the data we want, then stick with what we've got.
      console.warn(ex);
      return;
    }

    // At this point we can be in one of two places:
    //
    // 1. The zoom transition could still be going with a long time left.
    //    In this case, we'll synchronize the cross fade transition with the
    //    zoom so they finish at the same time.
    //
    // 2. The zoom transition may be almost done, or already finished.
    //    We still want a cross fade transition, but we'll schedule it on its
    //    own timeline.

    // Find out how long we've been waiting for data.
    const timeElapsed = Date.now() - zoomTimeStarted;
    const zoomTimeRemaining = ZOOM_TIME - timeElapsed;

    const fadeTime = Math.max(CROSS_FADE_TIME, zoomTimeRemaining);

    const fadeTransition = svg
      .transition("fadeTransition")
      .ease(TRANSITION_EASE)
      .duration(fadeTime);

    const xEndDomain = [0, newData.elements.length - 1];
    const xStartViewScale = d3
      .scaleLinear()
      .domain(xEndDomain)
      .range(s);
    const xEndViewScale = d3
      .scaleLinear()
      .domain(xEndDomain)
      .range([0, width]);

    const highResView = gDataView
      .insert("path", ":first-child")
      .attr("class", getClass(newData))
      .attr("opacity", "0");

    // If we're still zooming in, then animate the path coming in. 
    // Otherwise, we'll fade in directly at the end position.
    if (zoomTimeRemaining > CROSS_FADE_TIME) {
      highResView
        .attr("d", drawPath(newData, xStartViewScale))
        .transition(zoomTransition)
        .attr("d", drawPath(newData, xEndViewScale))
        .attr("opacity", "1");
    } else {
      highResView
        .attr("d", drawPath(newData, xEndViewScale))
        .transition(fadeTransition)
        .attr("opacity", "1");
    }

    // Fade opacity from 1..0 then remove the plot.
    lowResView
      .attr("opacity", "1")
      .transition(fadeTransition)
      .attr("opacity", "0")
      .remove();

    currentData = newData;
  }

  function zoomOut() {
    const oldDomain = xDataScale.domain();

    // Don't zoom out if we're already zoomed out.
    if (
      oldDomain[0] === X_FULL_DOMAIN[0] &&
      oldDomain[1] === X_FULL_DOMAIN[1]
    ) {
      return;
    }

    zoomTargetDomain = X_FULL_DOMAIN;

    // Adjust the X scale
    xDataScale.domain(X_FULL_DOMAIN);

    // Setup the transition
    const zoomTransition = svg
      .transition("zoomTransition")
      .ease(d3.easeSinInOut)
      .duration(ZOOM_TIME)
      .on("start", startZoom)
      .on("end", endZoom);

    // Transition the axis
    svg
      .select(".x-axis")
      .transition(zoomTransition)
      .call(xAxis);

    if (currentData) {

      // Zoom out to the top level
      const oldRange = [oldDomain[0] * width, oldDomain[1] * width];
      const oldViewScale = d3
        .scaleLinear()
        .domain([0, currentData.elements.length - 1])
        .range(oldRange);

      gDataView
        .selectAll(".dataView")
        .attr("opacity", 1)
        .transition(zoomTransition)
        .attr("d", drawPath(currentData, oldViewScale))
        .attr("opacity", 0.4)
        .remove();
    }

    // Zoom back in the top level data
    const N = topData.elements.length - 1;
    const xStartDomain = [N * oldDomain[0], N * oldDomain[1]];
    const xEndDomain = [0, N];
    const xStartViewScale = d3
      .scaleLinear()
      .domain(xStartDomain)
      .range([0, width]);
    const xEndViewScale = d3
      .scaleLinear()
      .domain(xEndDomain)
      .range([0, width]);

    gDataView
      .insert("path", ":first-child")
      .attr("class", getClass(topData))
      .attr("opacity", -1)
      .attr("d", drawPath(topData, xStartViewScale))
      .transition(zoomTransition)
      .attr("d", drawPath(topData, xEndViewScale))
      .attr("opacity", 1);

    currentData = topData;
  }

  // Find the fractional range of b inside a.
  function rangeFraction(a, b) {
    const span = 1 / (a[1] - a[0]);
    return [(b[0] - a[0]) * span, 1 - (a[1] - b[1]) * span];
  }

  // Fetch data to be plotted.
  async function fetchData(domain) {
    const level = levelFromDomain(domain);

    let nElements;
    if (level === 0) {
      nElements = dataDescriptor.nElements;
    } else {
      nElements = dataDescriptor.lodFiles[level - 1].nElements;
    }

    // Convert from the domain space 0..1 to actual elements in this scale level
    const elementStart = Math.max(Math.floor(domain[0] * nElements), 0);
    const elementEnd = Math.min(
      Math.ceil(domain[1] * nElements),
      nElements - 1
    );

    if (level > 0) {
      const lodFile = dataDescriptor.lodFiles[level - 1];

      // Determine byte offsets for these elements:
      // Each element is 2 bytes (min, max)
      const ELEMENT_SIZE = 2;

      const rangeStart = elementStart * ELEMENT_SIZE;
      const rangeEnd = elementEnd * ELEMENT_SIZE + ELEMENT_SIZE - 1;

      const view = await fetchByteRange(lodFile.fileName, rangeStart, rangeEnd);
      let elements = [];
      for (let i = 0; i < view.byteLength - 1; i += 2) {
        elements.push({
          min: view[i],
          max: view[i + 1]
        });
      }

      return { domain, level, elements };
    } else {
      // At level 0 we have actual data points (not min/max aggregates)
      const elements = await fetchByteRange(
        dataDescriptor.fileName,
        elementStart,
        elementEnd
      );
      return { domain, level, elements };
    }
  }

  // Determine which level to use for a view, given a domain span.
  function levelFromDomain(domain) {
    const domainSpan = domain[1] - domain[0];

    // Check level 0
    const nElements = Math.ceil(dataDescriptor.nElements * domainSpan);
    if (nElements <= dataDescriptor.maxElements) return 0;

    // Then check the LOD levels.
    let a = Math.log(nElements / dataDescriptor.maxElements);
    let b = Math.log(dataDescriptor.windowSize);
    return Math.ceil(a / b);
  }

  // Fetch a byte range for a file.
  async function fetchByteRange(file, rangeStart, rangeEnd) {
    const headers = { Range: `bytes=${rangeStart}-${rangeEnd}` };
    const response = await fetch(file, { headers });

    const buf = await response.arrayBuffer();
    let byteOffset = 0;
    let length = rangeEnd - rangeStart + 1;

    // If the server sends back the whole file for some reason,
    // then we'll handle it by doing our own offset into it.
    if (response.status === 200) {
      byteOffset = rangeStart;
    }

    const view = await new Int8Array(buf, byteOffset, length);
    return view;
  }

  // Fetch the descriptor file
  async function fetchDescriptor() {
    const response = await fetch(DESCRIPTOR_FILE);
    dataDescriptor = await response.json();
  }
}

zoom_transition_plot.js

Speed it up and we’re back to the interactive plot we started out with. While this walk-through is handling a one dimensional data set, using levels of detail to summarize large quantities of data is a technique that can also be applied to multi-dimensional data.

If you would like to take a closer look at the code and data discussed here (including the LOD generation script), you can find a standalone example on GitHub.

Have fun exploring!