THE WALLS HAVE EARS: Privacy and Risk in the Clubhouse
The views and opinions expressed herein are those of the author, and do not express the views or opinions of, or endorsement by, any other entity or organization other than the author. The content and ideas expressed herein do not constitute legal advice, nor are they intended to constitute legal advice, and should not be interpreted as legal advice.
Clubhouse, the drop-in, audio-only social media app, is the trendiest social media platform to come out of the United States in years. Recently it has also come under intense scrutiny from the privacy community due to fairly blatant, and perhaps intentional, privacy deficiencies. Clubhouse is an exciting new platform with the potential to expand the way people around the world interact and engage with each other. But its privacy shortcomings in view of its data collection and processing capabilities present legitimate risks to individuals and businesses, whether or not they use Clubhouse.
Clubhouse will be successful regardless of its privacy posture; it should embrace its opportunity to be a privacy leader among the next generation of social media platforms.
Clubhouse’s novelty and exclusivity is a potent combination for growth, and the platform is already reshaping the social media landscape and the manner by which people around the world can engage with each other, but it falls short on individual privacy protections.
The general premise of Clubhouse is that it allows users to create rooms where they can engage in conversations with other users. There is no video or text, just voice. Each room is hosted within the Clubhouse platform, and, assuming it’s not a closed room, users can enter a room to “drop in” on a conversation as a listener, and potentially a participant, similar to as if they were wandering a private club, hence the name. The experience feels a bit like scanning talk radio for an interesting topic, except the listener does not have to call in to the station and wait on hold to add their voice to the conversation. (They may still have to be called on by a moderator through the user interface on the app.)
People around the world are flocking to Clubhouse in droves. Or at least they’re trying to. At present, Clubhouse is still in “beta” and is invitation only. This exclusivity, in combination with the unprecedented level of access it offers to “drop in” on conversations with high-profile and celebrity users, has created a surge in demand to the point where people are selling invitations on online marketplaces. The exclusive nature paired with the novelty of the interactions — including, for better or worse, the potential for real time, stream of consciousness, unfiltered, unedited, think-it-say-it tangents — is creating an air of excitement reminiscent to the early days of Facebook, where access was limited to users with a “.edu” email address.
Clubhouse’s surge over the past few months has attracted attention from more than just eager users, as technology watchdogs are increasingly voicing concerns and criticisms of the platform. Some of the most consistent criticisms of Clubhouse are with regard to its privacy practices, or rather, what seem to be its glaring privacy deficiencies. Maybe these early privacy foibles can be chalked up to Clubhouse still being in beta. Maybe. But more likely is that Clubhouse’s success despite obvious privacy flaws is an exploitation of how interwoven social media is into our cultural fabric.
In other words, we have a high tolerance for how our personal data is collected, used, stored, and otherwise processed by certain technology offerings; we might not like it, but we have been conditioned to expect it, and are all too quick to accept obvious privacy shortcomings in order to not miss out on new tech trends.
Clubhouse’s privacy issues are not unique, but are surprising considering the modern global emphasis on, and demand for, better privacy controls in consumer technologies.
It is interesting to consider Clubhouse’s present growth-surge in contrast to what Facebook experienced in its early years. Facebook’s mass adoption was driven at least in part by its strong user privacy controls. Though Facebook’s policies and practices with regard to its collection, processing, and use of personal data have changed since, at the time, Facebook was able to point to its privacy controls as a differentiator from its competitors. One of Facebook’s competitive advantages early on was that it offered its users a level of control over their profile data they were not getting elsewhere.
Fast forward to today, still riding the wake of the Cambridge Analytica scandal surrounding the 2016 U.S. presidential election, “big tech” companies in general, and in particular Facebook, are treated like public enemy number one when it comes to consumer privacy. Couple that with an increasingly privacy-savvy global population and a rapidly evolving global body of privacy law, and it seems that any emerging social media platform would have to make privacy a priority in order to be successful.
Yet, rather than implementing privacy-by-design principles, Clubhouse’s approach to date feels more like privacy takes a back seat to its data collection activities. Privacy is acknowledged, but it does not seem to be a high priority.
Global privacy considerations given short shrift.
At this point, the bulk of Clubhouse’s privacy shortcomings are well documented, and so are not explored in depth here beyond a brief discussion of what are to date the most commonly highlighted issues. Of those, the two most readily apparent privacy concerns with Clubhouse include that its public-facing privacy disclosures don’t appear to be directed to individuals, or account for privacy laws, outside the United States, and that Clubhouse’s privacy disclosures and other practices suggest that it may be collecting data and building profiles about individuals who have not signed up for Clubhouse, i.e., shadow profiles.
Certain Clubhouse features may result in GDPR violations.
Uploaded contact data could be used to build shadow profiles.
The second concern, that Clubhouse is collecting data to create shadow profiles of individuals, is based in large part on the process of joining, and inviting others to join, Clubhouse. As noted above, Clubhouse is at present invitation only. If you have already joined Clubhouse, you are provided a limited number of invitations to dole out to contacts. (This limitation on available invites is part of the exclusivity that is driving demand.) Clubhouse continues to increase the number of invites allotted to members. But there’s a catch to inviting other users to the platform: in order to send an invitation, you can’t simply enter the intended recipient’s email address or phone number; rather, you first have to allow Clubhouse to access your contacts from your device before you can select a recipient. In other words, you have to provide Clubhouse with access to the personal data of all your contacts stored in your device in order to invite any of them to the platform.
Conversations on Clubhouse may be recorded, conversation content may be collected in real time, and audio from conversations may be vulnerable to government eavesdropping.
Less obvious but perhaps more dire issues include observations that conversations on Clubhouse are neither as private nor as secure as described in Clubhouse’s documentation. Specific criticisms point out that Clubhouse records audio, and therefore conversations, absent meaningful consent, and that portions of the Clubhouse app’s back-end infrastructure, which potentially stores user data and recordings, may be hosted on servers easily accessible by foreign state actors.
Conversations may be recorded without adequate consent.
The audio-only nature of the platform creates a challenging landscape for Clubhouse with regard to recording conversations. On the one hand, it is important that Clubhouse is able to investigate any abuse and other Trust and Safety violations that occur on the platform, and being able to review recorded audio may be the most effective way to do that. However, there are serious doubts as to whether users are even aware of, let alone have consented to, Clubhouse’s audio recordings, and therefore whether such recordings are made legally.
The requirements for legally recording a conversation vary from state to state and across the globe, and may depend on factors such as the medium over which the conversation takes place or is recorded, as well as other circumstances surrounding the conversation, such as whether it takes place in a public forum versus one where the participants would reasonably expect the conversation to be private.
Typically recording a conversation requires the meaningful consent of at least one of the participating parties. Meaningful consent to record a conversation is often obtained through a just-in-time notice presented in a manner that informs the individual of the recording and allows them an opportunity to refuse their consent. For example, when you hear “this call may be recorded for quality and training purposes,” you have the option to hang up if you don’t consent to the call being recorded.
All that said, Clubhouse does state that any temporary recordings are made solely for Clubhouse to investigate Trust and Safety violations, and that such temporary recordings are encrypted. Seemingly, then, even where consent for the recordings is obtained on shaky grounds, the recordings themselves are only used for a specific limited purpose, and are not disclosed or otherwise available to any third parties. Unfortunately, from a privacy perspective, there are problems that persist despite these limitations and safeguards.
Data related to conversations may be collected and used for profiling.
That said, Clubhouse may have very little reason or desire to expand its audio recording activities beyond its current practices. Additional recording activities would likely increase its exposure to consent-based and other privacy risks, and as discussed in the next section, Clubhouse is likely already capturing the bulk of the value from “content” created and shared on its platform.
Natural language processing can be used to collect real-time conversation data.
Regardless of whether Clubhouse is recording conversations only temporarily, indefinitely, or even not at all, it is likely still collecting data from conversations in real time through natural language processing (“NLP”) technologies. In basic terms, NLP describes the use of a computer to process human language data. When you use your voice to give commands to a connected device, such as through a smart assistant on your phone or any number of the internet-of-things (“IoT”) devices that many people have in their homes, that the device can perform or respond to the request is NLP in action (among other underlying technology).
Over the past decade or so, artificial intelligence and machine learning (collectively “AI/ML”) capabilities have exponentially advanced in terms of the efficiency with which they can process massive amounts of data. These improvements have greatly benefited other emerging technologies, including NLP, that can leverage AI/ML technologies for data processing to improve their own capabilities.
As an example, the media streaming service Netflix published a paper describing a use case for NLP to improve subtitles across its content offerings. The study describes using NLP for translation beyond recognizing words in one language and making a literal word-for-word translation to another. Rather, it describes a method to recognize accents, dialects, and expressions in order to make a contextually better and more natural sounding translation. In essence, in addition to translating the spoken dialogue, where contextually appropriate, the AI/ML-powered NLP processes the dialogue to translate the expression of an idea.
There is tremendous value in being able to derive insights from data with a high degree of accuracy. People are already aware of this on account of being followed around the internet by targeted advertising and curated social feeds based on their online content, posts, and browsing habits. Now consider if instead of posting content online, a person begins engaging in online conversations. A spoken conversation — with its back-and-forth volley and little to no “dead air” — is going to produce significantly more content versus a written thread.
Clubhouse does not have to make audio recordings of conversations to capture data produced during conversations. It can use NLP to recognize and process data from language spoken by participants in real time for the duration of every conversation. Essentially, if Clubhouse is utilizing NLP to collect conversation data, rather than keeping audio recordings of conversations, it can create data sets from conversations without any audio by using NLP to scrape data from the spoken content of the conversation participants. Such data sets could then be further processed — such as for specific keywords or topics — to derive inferences, and ultimately value, therefrom.
Clubhouse further discloses that it also collects personal data related to the types of conversations users engage in, and the types of content they engage in and share, and that it uses the personal data it collects to create “derived” data about users, i.e., it combines personal data to make inferences about its users. Considering that engaging in conversations is the primary use of the platform, and so likely how the bulk of the content gets created thereon, in all likelihood data from conversation content is being captured and included in such “derived” data.
One instance of the potential for government eavesdropping on conversations was already identified.
That the temporary recordings of conversations on Clubhouse are encrypted also may not provide the degree of security that users probably expect. In particular, in an entry posted to its blog on February 12, 2021, the Stanford Internet Observatory (“SIO”) detailed how a provider of back-end infrastructure to Clubhouse likely could access users’ raw audio data, even where encrypted by Clubhouse.
As noted by SIO, the vendor is an entity jointly based in the United States and China, and so is subject to Chinese cybersecurity law with regard to any data stored on servers in mainland China. Entities subject to such law are required to assist the Chinese government with locating and storing any data the government determines jeopardizes national security. Considering China’s willingness to prosecute individuals for speech critical of the government, it is not far-fetched to assume the Chinese government would take a strong interest in an increasingly popular social media platform whose entire service offering is based on enabling real-time conversations. (Clubhouse was blocked in China on February 8, 2021.) In response to the SIO’s report, Clubhouse stated that it would be “rolling out changes to add additional encryption and blocks to prevent Clubhouse clients from ever transmitting pings to Chinese servers … [and that it planned] to engage an external data security firm to review and validate these changes.”
The failure to implement meaningful privacy controls by design presents a real risk of harm to both individuals and businesses, even those who do not use Clubhouse.
As suggested above, that Clubhouse’s popularity is surging despite obvious privacy flaws may say more about the state of global social media consumers than it does about Clubhouse. That is not a reason to excuse Clubhouse’s failure to account for and implement stronger privacy controls from inception. The developers and investors behind Clubhouse are all too smart and experienced to have been ignorant of their responsibility in this regard, and the harm that could ensue if the app were to reach its tipping point without meaningful privacy controls in place, which it seems to be beginning to crescendo toward at this moment.
Then again, what harm will ensue on account of its privacy failures if Clubhouse continues on its current trajectory? And, outside the privacy community, who will notice, or even care? These questions are rhetorical, but also the reality is that the answers are going to depend at least to some degree on the inputs. That is, different people, age groups, social classes, cultures, etc., are going to view the risks versus reward of using Clubhouse differently.
As noted above, we have been conditioned to expect, and in many cases even accept, that the trade-off for getting to participate on a “free” social media platform is the collection and use of our personal data. Typically we assume, and even expect, that this data will be used for the purpose of serving targeted ads. This is true for Clubhouse, too, which states that it does not sell user data, but may disclose it to “social media platforms and other advertising partners that will use that information to serve you targeted advertisements on social media platforms and other third party websites.”
In the United States, where privacy rights have historically been treated something akin to property rights, collecting user data to create profiles for the purposes of serving more effective advertising may seem innocent enough, and even enterprising (albeit annoying when you’re on the receiving end). The U.S. view is in stark contrast with much of the rest of the world, including Europe, which regards privacy as a fundamental human right. But even those cultures where mass data collection and shadow profiling is viewed as a much greater infringement have large swaths of their populations that eagerly continue to jump on the latest social media bandwagon. They know the trade-off, and they make it anyway.
We have also seen that even nefarious uses of shadow profiles do not steer users away. The Cambridge Analytica scandal shone a spotlight on and confirmed some our worst suspicions about the darker potential of social media’s mass data collection and profiling practices, and in particular where users are asked to share their contacts as admission to the party. From less than 300,000 Facebook users who downloaded a particular app, Cambridge Analytica and its affiliates were able to gain unauthorized access to the Facebook information of nearly 90 million users. The information was used to create profiles for serving targeted political advertisements in an effort to support certain candidates in the 2016 U.S. presidential election.
It worked. People were angry and felt deceived by Facebook that it had so reduced its privacy controls to create an environment that fostered this exploitation of their data. There were hearings on Capitol Hill, which required Facebook’s CEO to testify before Congress for days. Facebook has been lambasted for its privacy practices since, and most recently has engaged in a privacy related war-of-words with Apple. And yet, through it all, Facebook’s number of monthly active users continues to grow.
Development and use of profiles of individuals without their knowledge, input, or ability to influence the data collected about them can perpetuate and institutionalize historical prejudices.
While it may be an extreme example, the Cambridge Analytica scandal is illustrative of the potential harms that the exploitation of personal data can result in. Individual consumers — the users of the technologies that collect their data — should not bear the burden of having to understand at a granular level how their personal data may be harvested, compiled, and subsequently used or shared by the platforms they participate on.
Rather, the consumer-corporate relationship must have trust built in. That is, as consumers and users of technology, we are willing to entrust the developers of amazing and innovative products and services with our personal data, but in doing so we trust our data will be used responsibly, and not in a manner or for a purpose that we would find unexpected, misleading, or deceptive. We understand that in many cases the product and service innovations we enjoy are built on insights gleaned from the data we share, or advertising revenue derived therefrom, and that our benefit of the bargain is that we get access to these products and services at a reduced or even zero monetary cost.
By prioritizing adoption and growth over establishing meaningful individual privacy protections, Clubhouse has set the stage to erode the trust of its users from the onset, and potentially harm them as well. This may not slow Clubhouse’s growth, but neither would leaning in to privacy-by-design principles and embracing the opportunity to be a privacy leader in a space wanting for the same.
It is easy to point to the result of the Cambridge Analytica scandal as an outlier. After all, mass collection of personal data and use of it for profile building are hardly unique practices among firms, and certainly not limited to the tech sector. While true, that these are widely accepted practices underscores the need for strong privacy controls to be incorporated therein. As technologies continue to improve, especially those that leverage machine learning to process huge data sets, it will be easier to profile and draw inferences about individuals with increasingly less human involvement or input. This can result in significant harm to entire groups or classes of people if there are not meaningful privacy controls in place to enable checks and balances against automated profiling and decision-making.
The harms that can result from unrestricted data collection and processing can be easy to miss because they are so deeply ingrained into societal processes as to be systemic, and rarely are obvious at the individual level. For the sake of example, let us assume that Clubhouse is developing shadow profiles of all the individuals whose personal information is uploaded through Clubhouse users’ contacts, but who are not themselves Clubhouse users. If this were in fact occurring, it would mean that the number of shadow profiles of non-Clubhouse users — those individuals who are not Clubhouse users, but whose personal information was contained in the contacts that a Clubhouse user disclosed to Clubhouse — would far exceed the number of profiles of actual Clubhouse users.
Combining data about an individual from all those sources can result in a remarkably detailed profile of a person. Consider how much of a person’s daily routine involves online touchpoints just through their internet browsing activity and social media. For many people, collecting data from just those two sources would likely reveal significant insights into their lifestyle, such as social habits, transactional and shopping habits, behavioral insights, age, race, sex, marital or dating status, health issues, education level, political leanings, and religion.
Another common use is by firms with an interest in creating risk profiles, such as insurers, lenders, and even employers. Such use presents a legitimate risk of harm to traditionally underrepresented and underserved communities.
Consider fundamental institutions, such as housing, finance, and healthcare, where previously engrained prejudices, such as those based on race, sex, or religion, created a significant barrier to equal participation in the marketplace for certain groups or communities. Where entire generations within a community are impacted by prejudicial policy, the resulting data is likely to reflect the prejudicial impact across the entire data set of individuals within the community. When profiles used for automated decision making — such as decisions related to evaluating risks associated with granting a home loan or issuing health insurance — are based on inferences derived from data tainted by historical prejudice, then it logically follows that such prejudices will be reflected in such profiles and so also in any analysis based thereon. Garbage in, garbage out.
The detrimental effect is that, despite policy changes intended to remove historical prejudices, huge amounts of generational data tainted by such prejudices end up shaping inference-based profiles that are used to assess individuals’ credit-worthiness, insurability, and employability, thereby continuing to harm individuals from the communities subject to the historical prejudice. Shadow profiles compound this harm precisely because the profiling is developed without any input or knowledge from the individual data subjects. If an individual does not know who is developing a profile about them (or even that they are being profiled), for what purpose, or how or by whom it will be used, then they cannot correct any inaccuracies therein, and so will be vulnerable to any detrimental impacts from the same.
This is not to say that Clubhouse is developing shadow profiles that will be used in the manners described above, or otherwise using the data it collects irresponsibly. But the lack of strong and meaningful privacy controls across the platform combined with the unchecked ability to collect mass amounts of personal data for any purpose or no purpose — even from non-users — presents a persistent risk that Clubhouse, or any other party with access to this data, could engage in these reckless or similarly harmful activities at any time.
Reduced privacy protections coupled with explosive growth will harm businesses and individuals, and probably even Clubhouse.
The harms described above are far from the only risks presented by Clubhouse’s data and privacy practices. As noted earlier, European privacy regulators already have vocalized their concerns about Clubhouse’s practices, and most certainly will pursue regulatory enforcement against Clubhouse, which could result in massive fines under the GDPR.
The risk of harm from a data breach — for both Clubhouse and individuals — is also significant, and increases exponentially with increased data collection. For Clubhouse, a data breach would potentially expose it to costs and liabilities related to remediation and investigation of the breach, notification to individuals and regulatory authorities, potential litigation damages and regulatory penalties, as well as reputational harm. For individuals, the impact of a data breach can be direct, such as use of their data to perpetuate activities such as identity theft, social engineering, and similar crimes sounding in fraud or resulting in financial harm. But individuals can also be impacted more indirectly, such as if the unauthorized acquisition of data results in the use of such data by an unknown third-party for developing profiles to target or evaluate individuals, as described above.
Clubhouse’s audio-only format and moderator controls present an exciting platform for businesses to use to engage with customers, prospects, and employees. But businesses and professionals should also consider Clubhouse’s privacy and data practices, and the potential risks they present. For instance, before encouraging employees or customers to engage on Clubhouse, the business should consider whether the use it is encouraging would cause the other party to violate an applicable data protection law, such as the GDPR for certain uses by EU-based entities or individuals. The business should also consider other potential impacts to the individual, such as whether and how much of their personal information will be collected through use of the platform, as well as the types and topics of conversation the business is encouraging participation in.
Businesses should also be mindful of the content of the discussions they participate in or foster on Clubhouse. Unlike text-based platforms, where posts may be subject to significantly more curation and review prior to publication, Clubhouse engagement involves real-time conversations, which are inherently harder to control for content creators and moderators alike. Participants in a conversation need to be aware of any confidentiality obligations they may have, or otherwise risk disclosing trade secrets, breaching a confidentiality obligation, or engaging in slanderous, defamatory, or other discussion that could expose the individual or business to liability.
Clubhouse should have been better about privacy; it’s not too late for it to improve.
Pointing out the privacy shortcomings of Clubhouse and other technology platforms and services does not have to be an indictment, but rather can serve as a call-to-action to the developers of such technologies. There is both utility in and demand for innovative platforms for human engagement. But trust and accountability need to be present too. We do not need to look far back in our history to see the harm that can ensue as a result of reducing privacy protections. As such, it is fair for us as consumers to expect emerging technologies such as Clubhouse to incorporate strong privacy controls and responsible data practices by design and default, and to express our disappointment when they fall short.
 For an overview of the role that fear-of-missing-out, or “FOMO,” is playing in Clubhouse’s growth despite its obvious privacy flaws, see: Turrecha, Lourdes. When FOMO Trumps Privacy: The Clubhouse Edition. February 19, 2021. https://medium.com/privacy-technology/when-fomo-trumps-privacy-the-clubhouse-edition-82526c6cd702.
 Mehta, Sneha, et al. Department of Computer Science, Virginia Tech; Netflix, Inc. Simplify-then-Translate: Automatic Preprocessing for Black-Box Translation. May 27, 2020. https://arxiv.org/pdf/2005.11197.pdf.
 Cable, Jack, et al., Stanford Internet Observatory. Clubhouse in China: Is the data safe? February 12, 2021. https://cyber.fsi.stanford.edu/io/news/clubhouse-china.