Our quest to build Zoom in web: 3 treasure chests we unlocked along the way
The Zoom Video SDK documentation left us looking for answers, but their React sample app has nearly everything you need. Read on to find out what it’s missing!
In September this year, the TypeScript engineering team here at Remotion embarked on an ambitious project: prototype video calls using the Zoom Video SDK for Web in two weeks. We're building a video app for remote collaboration, so we wanted to see if Zoom could provide a better video experience for our users than our current provider, Agora. Zoom has slower call starts but better screensharing, and video quality autoscaling. The only way to know for sure which call provider performs better was to build it out.
Nearly 5 weeks later, we emerged out the other side of a meandering journey though documentation deserts, misleading sample code swamps, and endless timing error labyrinths with a fully functional Zoom web call experience we’re proud to share with our users. In the process, we pivoted several times on our technical approach. Here are three lessons that we wish we’d known from the get-go.
1. The proof is in the pudding. Rather, the working sample app.
In the first couple weeks of our work, we based our implementation on the official documentation, written for vanilla JS. However, we discovered that React sample app contains answers to edge case questions that we were blocked on for several days such as:
- How do we handle gallery view in browsers that don’t use shared array buffer and offscreenCanvasAPI? (eg. Safari and Firefox)
- Which events should we subscribe to in order to handle users joining, leaving, and changing their mic/camera settings?
- How do we render videos of remote participants already in the call for a new joiner?
- How do we handle users clicking on the “Stop sharing” button provided by the browser?
The React sample app subscribed to three events (user-added, user-updated, user-removed) to handle participants joining and leaving, as well as participants changing their mute/unmute for camera and microphone. Though the documentation suggests peer-video-state-change for rendering remote participant videos, this isn’t the full picture: for users that join while others are already present in the call, no events will be triggered.
To solve this problem, we look to the React sample app, where client.getAllUser() is used shortly after joining the session. We get all remote participants using this function, and iterate through each participant’s bVideoOn property to decide whether to render their video.
See more info on the Participant interface here.
2. Rendering remote user video streams: one canvas for everyone or one canvas per person?
Let’s get one thing out of the way: one canvas per person is not officially supported by the Zoom Video SDK and has significant performance drawbacks. You should not do this, but we did it anyways because we love bubbles and want to provide a beautiful user experience. It’s important for the web experience to be as elegant as our macOS app because web calls are many users’ first impression of Remotion.
The supported implementation (single HTML canvas element for all remote participants) is far more performant for calls with more than ~4 users, can show up to 25 users in a grid, and, specifically for React, has the benefit of only needing to track a single ref created through the useRef() hook (stay tuned for our post on creating, forwarding and using refs in React). However, our existing implementation is bubble based rather than rectangular. We were concerned that rectangles in the web call with bubbles everywhere else would be an inconsistent user experience.
We built this Zoom video call project with the intention of flipping as seamlessly as possible between video call providers, as a means of testing whether Zoom would even be a viable option for Remotion. Rendering each remote participant on a separate canvas allowed us to save valuable time and avoid splitting our sizing/positioning algorithms into separate implementations. Unfortunately, requesting multiple streams instead of one from the Zoom Video SDK is computationally heavy, so we mitigated this by changing pagination to a maximum of five video bubbles per page.
We are now working on a second iteration of Zoom video calls, using a single canvas approach. The main difficulty is how we will render users, as we use bubbles (diameter based spacing and calculation), instead of rectangles (height/width based spacing). Two methods to achieve our existing designs using a single canvas are being hotly debated: the cookie cutter method vs the swiss cheese method.
Cookie Cutter Method
This method was recommended to us by @tommygaessler, Lead Developer Advocate and our contact at Zoom. The approach uses a single virtual canvas hidden offscreen to render all participant videos. We then stamp out remote participant “video cookies” from this offscreen canvas to render on the page with the CanvasRenderingContext2D.drawImage() function. MDN once again has a great explanation on how this works, though this JSFiddle more practically demonstrates how this would work. Notably missing from the JSFiddle though, is the use of the sy, sx, sWidth, sHeight arguments to selectively copy over only a single remote user for each destination canvas.
A notable downside of this approach is that canvases will be created and destroyed frequently, triggering frequent re-render cycles in React.
Swiss Cheese Method
In the swiss cheese approach, a single canvas holding all remote users is rendered with z-index: -1. Overlaid on top of this canvas, we place a grid that effectively crops remote participants users into bubbles. This is very similar to the approach taken in the React sample app, in which a grid of div elements are placed on top of the remote participant canvas to outline each participant and display their name along the bottom edge. Though this method avoids triggering frequent re-renders that the cookie cutter method would present, it presents limitations of its own.
The main limitation with the swiss cheese method is that the video capture aspect ratio for Zoom is 16:9. Since the rendered remote participants are rectangular, the cropped bubbles would have larger gaps between them horizontally than vertically. Moving in this direction would likely entail a re-design of our web call UI, based on rectangles instead of circles (a move currently being hotly debated)!
3. Scale down user video stream while screen sharing is active
While testing screen sharing, we found that calls with more than roughly 5 participants were susceptible to having some of the participant videos freeze unexpectedly. This is another performance limitation of having separate canvases for each remote user (and thus separate video streams). To address this issue, we lowered the resolution of remote participant video streams while screen sharing was active. This had the happy side effect of stabilizing audio in calls with many users.
In order to accomplish this, we used our own isRemoteUserScreensharing field in our database, but this could also be accomplished by listening to the active-share-change event in Zoom. In the example below, we scale video resolution down 90P, which solved our video freezing problem. However, our more permanent solution to this problem remains the same as above: create a single remote user video canvas implementation.
Would we do this again?
In a word, absolutely. Integrating with 3rd parties is always challenging, and Zoom was no exception. However, this project allowed us to take a good look at our web video call architecture, which was a system that had grown organically over time to include an ever-expanding feature set. This project allowed us to refactor a big chunk of our existing web call codebase and introduce abstraction layers via a shared React context and several custom hooks. We expect once we complete our second iteration of Zoom web calls with single canvas, we will be able to address the performance issues brought up in both the remote user rendering and screen share scaling sections.
As a final note to readers working with the Zoom Video SDK, these three gotchas we ran into were just a few of the beasts we encountered in our project journey. Our boss battle was actually rendering the local user’s video on joining a call-a deceptively difficult task (required listening to video encode/decode ready events, as well as ensuring that the video element is mounted in the DOM). Please let us know @remotionco if you’d be interested in a second parter, or a comprehensive step-by-step guide in working with the Zoom Video SDK on React!
The case for virtual coworking: build a connected remote culture.
Regularly coworking with your hybrid or remote team can help you build the social cohesion that makes work feel less like work.
Here are the biggest reasons we think virtual coworking is an effective way to create a close-knit remote culture:
1. It fosters casual conversations.
Building a connected remote culture is all about fostering 1:1 or small group organic conversations. Virtual coworking makes space for those conversations. When you spend time together outside of agenda-driven meetings, spontaneous chats naturally occur, as they would in a traditional office.
2. It's more inclusive than scheduled social events.
It can be draining for introverts to have to participate in scheduled, purely social conversations. Virtual coworking allows the team to spend time together and occasionally chat without having to constantly be "on," making it more inclusive for introverts and extroverts alike.
3. It's easy to say yes to.
Purely social events are important, but if your remote team is busy or on a tight deadline, it's tough to find the time for social chats without it feeling like an obligation. Coworking is much easier to get your distributed team onboard with because it doesn't take time away from getting work done.
4. It improves remote collaboration.
Coworking can lead to unblocking and shorter feedback loops and stronger remote collaboration. Quick questions get answered easily and in the moment, without a having to schedule a meeting or go back-and-forth in messages. Coworking also builds peer accountability.
5. It's scalable.
Coworking works for teams of all sizes and is a great way to scale your remote culture as your team grows. It's helpful to create opportunities for teammates from different functions to get to know one another.
6. It creates shared momentum.
Virtual coworking helps remote workers for the same reason you might get a membership at a traditional coworking space: the feeling of togetherness is motivating!
Get started with virtual coworking: choose the type most aligned with your priorities.
It takes intentionality to make virtual coworking feel natural and energizing enough to stick—it's not as simple as leaving a Zoom call open all day.
Here are a few of the ways we've set coworking up for our team. We recommend choosing one to start with. If it works, make it routine and experiment with other types from there.
Try independent coworking.
Try project-based coworking.
Best practices for virtual coworking.
Keep group sizes small.
Limit your work sessions to 4-6 people to minimize distraction and help make introverted teammates comfortable chatting.
Signal boost coworking.
Set a norm of letting the entire team know when you're hopping into a coworking room or session.
Make it routine.
Once you've figured out what kind of coworking works for your team, make it a regular, opt-in event. Set up a recurring calendar event to do it at the same time each week to maximize the impact.
Set expectations ahead of time.
When you're first introducing coworking to your remote team, share what you're imagining in your calendar invite and at the top of work sessions to get everyone on the same page. For example:
Let's try virtual coworking! We'll work independently on our own projects with our cameras off, but we'll share virtual space and listen to music together — like we might work side-by-side at a physical office.
Listen to music together.
Play music while you work in a virtual room to create a shared environment and add a little bit of personality to your virtual coworking session.
Set up Coworking Rooms in Remotion.
Most of the above is doable with any video chat app or virtual office, but much easier with Remotion—which we designed with a lightweight, smooth coworking experience in mind. Remotion is the perfect virtual coworking platform—easily set up virtual rooms that your teammates can hop into for different styles of coworking.
While Remotion's virtual workspace is free to use with your remote team, if you're curious about joining a virtual coworking community built on our platform—check out Swift Remote Studio for iOS, Mac, and Swift developers.