Sparrow WebGL Devlog 7: Pixel Perfect 3D Object Selection

Implementing color picking in JavaScript and WebGL

29.06.2023 - 20:41
Imagine a video game in which you couldn't interact with any objects in the 3D environment. It probably would be very boring.

Attacking enemy units or champions in League of Legends, clicking on your buildings in RTS games, and talking to an NPC in front of you in RPGs, everything requires the game to be able to determine which object you have clicked on.

There are two main methods of implementing 3D picking: Ray casting and color picking. In this blog post, I am going to talk about their differences and why and how I added color picking to my Sparrow WebGL engine.

Ray Casting vs. Color Picking

Let's start by talking about ray casting. This technique casts a ray from the camera through your mouse position into the 3D scene and calculates which objects it intersects with. This can be done with the inverse view projection matrix. The projection matrix converts from 3D to 2D, so the inverse projection matrix can be used to go the other way from 2D to 3D.

With an equation for the 3D line and some fundamental linear algebra from school, it's possible to calculate the intersection between a line and many different 3D shapes. However, some of the intersection calculations are a lot faster than others. Two shapes that are especially easy and fast to test are spheres and boxes (especially axis-aligned boxes). This is where the term hitbox comes from. Complicated 3D shapes like characters or trees are approximated by boxes around them and it's quite fast to check whether the mouse ray intersects with that box.

However, this is where the problems with this solution start. While it's easy to calculate a ray-box intersection, this will give us a lot of false positives because most shapes are not very box-like, so a surrounding box covers a lot of empty space. This can be mitigated by using more boxes to approximate the shape of the object (i.e. one box for the head, one for the torso, boxes for each leg and arm, etc.). While this makes the hitbox calculations more accurate, it introduces a lot more boxes, and the more boxes we have to check the slower the algorithm gets. The number of checks can be reduced again by storing the boxes in a recursive spatial data structure, but this adds more complexity too. Another issue with hitboxes is depth. Even when we find an object that the mouse ray intersects with, we still have to check all other objects because there could be another object in front of the first object that the line intersects with too.

In general, ray casting requires a lot of math to calculate the line equation and the intersections for different 3D shapes. I have implemented ray casting in the past and I could convert my existing implementation to JavaScript and WebGL, but why add complicated math when there are other alternatives?

Another method of selecting objects in 3D space, and the method I prefer, is color picking. In this method, all selectable objects are rendered to a texture with a unique color. When clicking anywhere on the screen, the color of the texture at that position is read and because every object has a unique color, we know which object has been picked.

The biggest advantage of this technique is that you can select objects pixel perfectly (or technically almost pixel perfectly, because it's generally good enough to do the extra render pass at half or even lower resolution). You would need to go down to per-triangle intersections when using ray casting to achieve a similar level of precision. The biggest drawback of color picking is that you need another render pass to render the objects to a texture. In the worst case where all objects are selectable, this will double the number of draw calls. If you don't need hoverable objects, you can negate this problem by only rendering the objects to the texture when the mouse is actually clicked.

Performance-wise, it probably depends on the exact situation whether ray casting or color picking is faster, but in my opinion color picking is more versatile and more precise with less effort (and certainly a lot less math).

A normal 3D scene with the color picking texture at the top:

Implementation

As mentioned above, color picking requires rendering the selectable objects to a texture. Rendering to textures in WebGL is done with framebuffers. Framebuffers can be used to render objects off-screen, which is important for many features like shadow mapping, post-processing effects, dynamic reflections, and more. I hadn't added framebuffers to my WebGL engine yet, but implementing them wasn't difficult because I could just port my C++/OpenGL version to JavaScript/WebGL.

I have also previously implemented color picking in C++/OpenGL, so I was able to use it as a starting point. However, that implementation was very specific to my map editor and I had to make a lot of changes to make it more flexible for my WebGL engine. In my existing version, all objects in the editor were clickable and the backend of the editor handled all of the logistics for it. One of the changes I wanted to make was making the pickability optional and easily controllable on a per-object basis. I added a superclass for all 3D objects which has an addOnClickHandler( onClickCallback ) function. When the function is called the object gets added to the color picker and only the added objects are rendered to the texture (actually now that I'm writing this, I might have to render all objects anyway, because another non-selectable object could cover a selectable one, which would mean it would be possible to click through it).

var testCube = new Sparrow.Cube( engine , 0 , 0 , 0 , 1 , 1 , 1 , {color: Color4b.WHITE} );

    testCube.addOnClickHandler( () => { console.log( "testCube clicked" ); } );

One of the annoying problems was that there are different ways of rendering objects, most notably vertex-skinned objects which have to be animated in the vertex shader. This meant that the color picker had to have different shaders for different types of objects. Luckily, the shaders are very simple because they only care about the positions. The vertex shader also has a uniform for the modelID. The ID is converted to a color that is passed to the fragment shader, which is quite literally the simplest fragment shader you can have.

var vs = `#version 300 es

layout ( location = 0 ) in vec3 vertex_position;

uniform mat4 u_M_mvp;

uniform uint u_modelID;

out vec4 color;

void main()

{

    gl_Position = u_M_mvp * vec4( vertex_position , 1 );

// convert the model ID to a color

    color.r = float( ( u_modelID & uint(0x000000ff) ) >> 0  )/255.0;

    color.g = float( ( u_modelID & uint(0x0000ff00) ) >> 8  )/255.0;

    color.b = float( ( u_modelID & uint(0x00ff0000) ) >> 16 )/255.0;

    color.a = 1.0;

}`;

var fs = `#version 300 es

precision mediump float;

in vec4 color;

out vec4 fragColor;

void main()

{

    fragColor = color;

}`;

When the mouse is clicked, the color of the texture at that position is read and converted back into an ID, which is just the index of the model in the list and the callback function is triggered.

pick( mouseX , mouseY )

    {

        this.framebuffer.use();



        this.engine.gl.pixelStorei( this.engine.gl.UNPACK_ALIGNMENT , 1 );



        var data = new Uint8Array( 4 );

        // read the texture data at the mouse position, flip the y coordinate

        this.engine.gl.readPixels( mouseX/2 , (this.framebuffer.height*2-mouseY)/2 , 1 , 1 , this.engine.gl.RGBA , this.engine.gl.UNSIGNED_BYTE , data );



        // if the white background is hit, return

        if ( data[0] == 255 && data[1] == 255 && data[2] == 255 ) 

        {

            this.engine.gl.bindFramebuffer( this.engine.gl.FRAMEBUFFER , null );

            return;

        }



// convert the color back to an index

        var objectIndex = data[2]*65536 + data[1]*256 + data[0];

        if ( this.objects[objectIndex].onclick !== undefined ) this.objects[objectIndex].onclick();



// restore the default framebuffer

        this.engine.gl.bindFramebuffer( this.engine.gl.FRAMEBUFFER , null );

    }

For the most part, my color picking implementation works quite well, but there are some code design aspects that I would like to change. Especially the support for imported GLTF models is a hacky solution at the moment. As I mentioned in the two previous blog posts, GLTF is a great format, but it goes against some of my preferences for graphics programming. I'll have to make a decision on how I want to handle GLTF imports in my engine.

The color picking texture in higher resolution, you can see the different shades of red of the cubes:

Color picking is a very versatile method to select objects in 3D space. Like everything, there are pros and cons and there are situations where ray casting would be better, but overall it's a good solution that's going to work for most projects I want to work on.

by Christian - 29.06.2023 - 20:41

#WebGL

#JavaScript

#WebDev

#GameDev

Comments

Comments are disabled

Cookie Policy

Sparrow WebGL Devlog 7: Pixel Perfect 3D Object Selection

Implementing color picking in JavaScript and WebGL

Ray Casting vs. Color Picking

Implementation

Comments