Visualizing Sloan Digital Sky Survey Data

I got inspired this evening after learning about the Boötes Void, also known as the Great Nothing. It’s a huge empty region in the distribution of galaxies, nearly 200 million light years across, with only about 60 galaxies where we would expect roughly 2,000. I was a bit annoyed that the image I saw appeared to be an artist’s conception without being labeled as such, so I thought it would be interesting to visualize real data instead.

This notebook plots over 2.3 million galaxies and 750,000 quasars (quasi-stellar objects, or QSOs) from the Sloan Digital Sky Survey (SDSS). You should also check out The Map of the Universe for a view with more context.

Objects are colored by type and redshift. Nearby galaxies cluster into filaments and fade from white to red with distance. Farther out, quasars extend to redshifts near 7 and reveal large-scale structure across cosmic time, shifting from blue nearby to red at great distances. The most distant objects show the universe as it was less than a billion years after the Big Bang.

Object positions below are computed from their redshifts by converting them to comoving distance and then projecting those distances along the observed sky directions. This factors out the overall expansion of the universe and lets large-scale structure appear in a roughly fixed coordinate system. Because each object is seen at a different lookback time, the full configuration shown here never existed all at once at any single moment in cosmic history.

The most obvious feature is the radial, wedge-shaped pattern. This comes from how SDSS scans the sky, shaped by the telescope’s location in New Mexico and by avoiding regions blocked by dust and stars in our own galaxy. The universe itself surrounds us in all directions and may be much larger than what we can observe, or even infinite.

Data and methods

Data was fetched from SDSS DR18 using astroquery. The query selects spectroscopically-confirmed objects from the SpecObj table, filtered by object class with zWarning=0. Queries are split into redshift bins to stay under SDSS’s 500k row limit. The query looks roughly like:

from astroquery.sdss import SDSS
from astropy.cosmology import Planck18 as cosmo

result = SDSS.query_sql("""
  SELECT s.ra, s.dec, s.z FROM SpecObj s
  WHERE s.class = 'GALAXY'
AND s.z BETWEEN 0.02 AND 0.08
AND s.zWarning = 0
""")
distance = cosmo.comoving_distance(result['z']).to('Mpc').value

Obtaining the data was surprisingly easy.

Redshifts are converted to 3D Cartesian coordinates using Astropy’s Planck 2018 cosmology. Comoving distance in Mpc is computed from redshift , then projected: , , , where and are right ascension and declination.

Each object is stored as four float16 values (x, y, z, color_param). The color parameter encodes both object type and redshift: galaxies map to [0, 0.5) and QSOs to [0.5, 1.0], allowing the shader to distinguish types and interpolate colors. Data is gzip-compressed and loaded progressively by redshift range. It would probably be more efficient to work out positions in the browser, but I opted instead to do the work up front and transfer data in a format that would allow directly uploading to the GPU and piping to the screen.

Rendering uses WebGPU with HDR accumulation and tonemapping. Points are rendered as single-pixel point primitives (there was some quad sprite rendering in place for larger points, but single points seem fine when you have 3M points!). To maintain consistent perceived brightness across zoom levels and point sizes, the shader scales intensity inversely with both camera distance and point size. Without this correction, zooming out would cause the image to blow out as millions of points overlap, while zooming in would make sparse regions too dim.