Late 2022, I quit my job at Meta Reality Labs.
I’m not looking for any commercial projects until at least 2024. In academia, a year outside your normal work environment is formalized as a “sabbatical”; this my DIY Sabbatical.
Gestural Input, 2008-2023
At Meta, I was working on EMG Input for AR since the CTRL Labs acquisition nearly 4 years ago. I’ve worked in gestural interaction research since 2008, starting with the DiamondTouch. After large multitouch tables, I segued into whole-body mid-air interaction with devices like the Kinect, with a significant detour into realtime SLAM. Towards the middle of my PhD, my work on practical gestural interaction hit a wall: while detector research showed excellent results in published papers, they never worked as well in real applications. This is because users perform gestures differently during real in-the-wild use than they do during prompted training data collection (see Background Activity on my academic page). With EMG at Meta, we solved this by implementing multiple levels of training data capture and closed-loop evaluation along the spectrum of ecologically validity. While Meta’s EMG product isn’t released yet as of today, I have direct knowledge that it’s just a matter of chasing the long tail of remaining edge cases, it really, really “just works”. You’ll be shocked how well it works when you get to try it. After 15 years, I feel confident the on-the-go gesture input domain is “solved”, and I feel privileged to have participated in bringing gesture input into reality.
This sabbatical is about me shopping for a theme for the next 15 years.
Generative AI is very hot right now, but I am surprisingly unexcited about working in that domain myself. It’s possible that I’m being a bit of a hipster and I’m not excited because everyone else is; avoiding the current gold rush so you can look for the next one is a valid instinct. I have used generative tools for quite some time, including procedural generation for creative work, and I even had a short stint in 2017 working on an startup using GANs for exposure therapy (NDA’d, like much of my work). I tried several of the generative AI tools in 2023, and I can’t deny that the outcomes are certainly better, but they are still simply impressive toys. Regardless of the quality of an algorithm, the best contribution around a generative system is always the wrapper for incorporating it into an activity with real-world value. Generative algorithms, right now, are a big shiny and impressive hammer in search of real nails, and everyone is fantasizing that it will (eventually) hit every nail perfectly well. But how we get there is unimagined and unmapped.
All of the themes I’m going to focus on this year feel like unaddressed nails.
(links go to sections below)
- Asynchronous Presence & Interaction
- Body Language for Longitudinal Input/Output
- Algorithm Gardening
- Animals with Jobs
- Post-Rectilinear Photography
- Alternative R&D Organizational Models
I am exploring these through readings, collaborative and solo projects, and hands-on practicums where I help out others. I’m starting broad instead of deep and I’ll share more later about how I’ve managed my time (hint: pretend I’m in school again and taking courses where I am both the instructor and student)
Asynchronous Presence & Interaction
Creating joyful, intimate, and productive ways of relating, without requiring synchronicity.
The COVID pandemic kickstarted accessibility to remote work, and play, but we still retain a cargo-cult assumption that the best activities are synchronous, and those that aren’t are less valid; ie. the famous phrase “this meeting could have been an email”.
On the other hand, the UX of today’s asynchronous activities lacks an ambition for creating a sense of presence, or cognitive empathy with other participants that happens for free when the activity is synchronous. Consider the sensation of co-editing a document, or co-watching a movie with – can we produce the same feeling of intimacy asynchronously?
I believe improving the value asynchronous interaction will lead to more accessibility to work for people who can’t access it, due to time zone, or personal constraints. However, I’m going to entirely explore this theme in game-like contexts. As I’ve pitched this async theme in the context of the workplace, it has exposed a lot of baggage associated with revealing presence and availability in hierarchical workplace relationships. Many people don’t like the idea of others knowing when they did work, or when they are online. This consideration feels like a distraction I want to ignore to start. Since a workplace has a lot of rigidity around (performance of) value, there’s a conservativeness associated with new activity models. Here I get to provide one of my favourite facts: Slack came out of the original team trying to develop an MMO, yet building Slack as a collaboration tool for internal use.
I’ve already completed one project: asynchronous worldbuilding, in a forum, for my birthday party.
Body Language for Longitudinal Input/Output
Centering communication via body language for effective wearables in real life.
I have made many, many prototypes using novel hardware and form factors for input and output. However, these have primarily had visual UIs, and been intended for short sessions of focused use. There’s a bias in general in academia and industry towards this type of prototype, since new work is often ideated and shared in short, visual presentations. During my time at Meta, I worked specifically on all-day wearables, and discovered:
- Haptics are very impactful, but have always been an afterthought for me.
- There are many body language features we could use for implicit input, but don’t take advantage of yet. My personal favourite discovery is NIMI.
- R&D (in my circles) lacks best practices for studying longitudinal interfaces that augment every day life. One method is experience sampling.
In the past couple years, I’ve started to use the term “interleaved interaction” to describe how your in-the-wild user will alternative between attention or interaction with a system, and other elements of the real world. This is quite poorly studied, simply because it is a pain and generates vaguer metrics than studies that have participants focus entirely on a sterilized task. Check out several examples of interleaved interaction in the latter half of the video we made for Background Activity.
Currently, I’m assembling a hardware platform for prototyping.
Empowering naïve users to build effective relationships with algorithm owners.
Your average digital consumer is aware that the large-scale digital services they engage with, from social media to credit scores to photography to self-driving, are “algorithms” [pejorative jazz hands]. Consumers think of algorithms as black boxes they have no meaningful control over, or feedback to. This is in large part due to algorithm owners (ie large tech corporations) having motivations that are in not exact alignment with their consumers.
Sidebar on AI: I’m using the term algorithm as an umbrella, covering any automated decision making, as AI carries distracting anthropomorphization baggage. The problems associated with misaligned AI are not new compared to misaligned algorithms, merely intensified.
Even in an ideal relationship, where consumers and algorithm owners were in perfect alignment:
- we don’t have any good examples of UX that allow for effective feedback from consumers to algorithm owners; the state of the art is using ratings, like buttons, upvote/downvote, attention, or in-situ bug reports (please write a paragraph to describe the bug).
- algorithm owners don’t have a great way to include consumers in the loop in updating their algorithm, including AB tests. Sometimes this testing should be blind, but if a consumer suspects they’re being included in AB condition, this can create a feeling of being gaslit, generating erratic behaviour which pollutes the actionability of the AB test.
- consumers and algorithm owners don’t have a great way to articulate ephemeral changes. The current method is for consumers to manually having to have the presence of mind to mute words like “Apple” during Apple events and then having to unmute them later.
I’m using the metaphor of “gardening” to indicate that I hope your average consumer develops a degree of actionable consideration and control over an algorithm’s behaviour, without setting the unrealistic goal of every consumer becoming an expert in every algorithm they use. Our strange status quo is that algorithms are trained on large scale data generated and labelled by humans in low-income areas, for use by humans in high-income areas; this can lead to strange cultural side effects, where what is considered rude language in India is not considered rude in Canada. Algorithm owners, and today’s consumer, see it as an important value that a new algorithm is “pick up and use”; I would like to argue (tangibly) that it is better that consumers become used to the idea of having to collaborate (garden) with a new algorithm, instead of viewing it as a magic black box. It’s important to know just a little bit of how the sausage is made, for all stakeholders. By solving this gardening UX, my hope is algorithm owners can develop better user experiences overall by figuring the which knobs to expose to consumers, and how.
Algorithm Gardening isn’t just a set of tools, but a state of mind. Part of effectively providing feedback to a complex system is having a critical, scientific state of mind. Sometimes an issue is just an anecdote, but sometimes a collection of similar anecdotes are dismissed when they indicate a new trend. As such, effective algorithm gardening has an overlap with themes of “Citizen Science”. I’d love to create a toolkit to help your average user do effective sousveillance without getting trapped in a Skinner Box.
This theme is heavily inspired by my work at Meta Reality Labs, where I was heavily involved in user study design and analysis of frequently-updated machine learning models. User feedback was limited, and even when we introduced some feedback options, consumers and algorithm owners would use different definitions, and thus taking effective action was difficult.
I have been using a fairly fixed toolset and domain for machine learning for the past few years, so to start with this theme, I’m doing a refresher of ML/Control Theory tools and research outside what I used at Meta, and then get deeper into the Algorithm Gardening theme in the fall.
Animals with Jobs
Exploring communication and deal-making between human and non-human infrastructure.
If we think of each sabbatical theme as in a portfolio of risk/reward, Animals with Jobs is highest risk and highest reward. You may have heard me say a couple times over the past year that I really want to get into falconry; this is that.
Backstory: When I moved into my apartment, to my delight I discovered I had access to a large roof. However, I found out I lived too close to the World Trade Center, and could not fly a drone from there, because 9/11. I sidelined this dream until a couple years later, when by day I was working with smaller and smaller AR hardware. Then it came to me: what if I put an AR headset on a bird, and could send it visual directions where to fly? There is precedence in myth for this nearly exactly:
Pragmatically, the use case of controlling the realtime flight of a bird wearing a camera, and streaming a stable image is impractical except on larger birds of prey. If you could control the flight of a bird, you would likely use it for surveying, generating imagery after the fact, and instructing the bird where to fly using haptics instead of the tiniest heads-up display.
When something seems like a cool idea, but doesn’t currently exist, one should ask why not. Animals have been used for labour throughout history without direct human supervision, notably carrier pigeons. Today, animals can have the roles of either pets, food, or “wild”. With the advent of radio, we stopped using carrier pigeons, yet several decades later, through miniaturization and GPS, we’ve invented turn-by-turn directions – yet haven’t turned to their use on animals. There are wearable trackers that cat and dog owners use to see where they’ve gone in the neighbourhood during their day; why not also enable these for delivering instructions like “come home for dinner” or “meet me at the park”.
I trust the failure modes of an animal much more than a drone; while due to a bug or a rogue human operator a drone could fall out of the sky or crash into something, a bird will be imperfect, but have a strong sense of self-preservation.
There has been a lot of research across various animals to attempt to get them to speak human language in sentences; I feel this is a bit sad, motivated by having them being entertaining, or empathizable, like real people. However, to give Animals Jobs, it’s a much scope-reduced research problem to determine the medium, training and feedback for them to deliver instructions and compensation.
Part of this theme’s exploration will be a philosophical framework. Many people react with concern to this pitch; discomfort at the thought of making animal slaves, or tainting animals’ “wildness”. I’d like to think deeply about this. In terms of biomass, right now humans and domesticated animals outweigh wild animals; we can attempt to preserve some portion of earth and its species as “wild”, but this meaning of wild will increasingly be inaccurate. A future where many animals have jobs is a better future than one where there’s almost no wilds left and most of our infrastructure is robots. I would also like to challenge that the only relationship we can see with less beings is complete domination (domestication) or none (wild). I would like to explore the language of negotiating deals with animals for gig work. One key aspect of this is not paying animals in food; while I may be well-intentioned, someone who comes later may not be, and could be incentivized to disrupt a local animal population’s food supply so that they can force more animals to work. Notably, many birds experience a rush of dopamine when listening to birdsong; perhaps I could pay birds for their work with tickets to a bird nightclub.
At worse, this theme is a nice excuse for myself to leap into a deep paper-reading hole and have some very interesting conversations with (hopefully grateful) academics.
Envisioning next-generation capture & share of photos, where the tokens are not restricted to a static rectangle.
As cameras get mounted in smaller wearables, consumers take more pictures, more casually. When I went on a backpacking trip in Germany in 2022, I reviewed the few hundred photos I took, and the aesthetic quality of the photo was not a factor under consideration for >95% of photos. Here’s a hint that we should re-think how photographic data is used.
Snapchat Spectacles were one of the first casual picture-taking wearables, and instead of a photo, when you tapped the capture button it took a short video. Meta’s Rayban Stories use the same approach. For both, the use of a video instead of a photo is to avoid the poorly-framed photos that results from someone trying to take a picture of a desired object without live visual feedback; if you take a video instead, there’s probably a frame in there somewhere that has what you need, right? Even iOS’ Live Photos feature makes it so each photo is actually a short video.
I also noticed I would take clusters of related photos. A selfie with me and a friend at a location, followed by a picture of the location. A historical object, then a plaque describing it, and then, after I stepped backward, a picture showing the object in context on a stone wall. If I was to share these photos, they would always be shared as an atomic group.
There are existing techniques to aggregate multiple photos into a single artifact, such as panoramas, or 3D scans. But these are time-consuming and too finicky for the average consumer, thus intolerable for casual photo-taking. Currently to share a view that doesn’t fit into a single frame, like an apartment, you will get a 10-60 second video. The recipient is not expected watch this video in realtime, but instead will scrub their finger back and forth along the video – there is a clear gap in UX here. Panoramas and 3D scans aim towards accurate spatial literalism in a way that the consumer doesn’t really care about for casual photo taking. There are well-established techniques like seam carving that aren’t yet exploited for your average consumer’s benefit.
For this theme, I’m going to dig into some tools and UX for casual capture of photo and video, where the end-result is aggregated into a novel artifact.
Alternative R&D Organizational Models
Building experience in how new, useful knowledge is generated outside my familiar models: startups, academia and megacorp research sub-orgs.
There are many models of Research & Development, and all of which have positives and negatives. I’ve made a career of creating knowledge in new areas, and don’t want to get stuck applying my familiar organization models when it doesn’t suit the research area. I want to fill my gaps in experience: for-profit think tanks, state-funded research (ie NSERC or ARPA), focused research organizations (FROs), third-party research contractors and non-corporate forms like co-ops and volunteer organizations. Of particular interest to me is:
- how research is funded and its intellectual properly licensed. Research is expensive and risky, and so it makes sense to repay the research investment; however the downside of this is that often for-profit entities keep outcomes private indefinitely, on the off-chance it turns out to be profitable.
- publishing, internally and externally, in a reusable form. Often, it’s a plain pdf with hand waving over the hard technical bits. I’m not saying the solution is to “publish code”, as lots of code is bad and one should instead start from scratch. Instead, I’m curious why research is considered done when its outcome is announced, and there is no support mechanism for further work. In general, this is a problem of tech transfer.
Stay tuned! I’ll share outcomes here on this blog. Thoughts? Want to collaborate? Email me or comment here.