how to ask gender in a linguistics study

Kirby Conrod
10 min readSep 13, 2021

I get asked this a lot! This will be a short, informal version of work I’m in the process of writing up for formal publication in linguistics — it’s not yet peer reviewed, I’m not putting a ton of time into citations, and I’m not going to go deeply into technical details. I am, however, writing with an audience in mind of people who do linguistics research, a group which includes students at all levels as well as faculty and professionals in industry. My advice should be basically applicable across that wide range of folks, but if you have specific questions about this, feel free to ask me on twitter!

How do I ask for my participants’ gender and/or sex in a linguistics study?

The implicit question, which people usually don’t say outright, is: how do I ask for gender/sex without (getting accused of) being transphobic or cissexist?

Short answer: that depends on your research design. There’s lots of ways to be transphobic, and also lots of ways not to be.

Longer answer: in order to ask about gender and/or sex without being shitty, you have to know these things very clearly:

  • What exactly are you asking for?
  • Why are you asking for this information?
  • What are the unwritten implications of the way you’re asking?
  • How does this serve the people who are giving their time to your research?
A simple MS paint drawing of kirby (in a yellow coat, with lab safety goggles and purple spikey hair) placing a little green stick figure in a beaker. There are three beakers on the table, containing green, red, and blue stick figures)

What information exactly are you asking for?

I have a previous post that goes into this in more detail, but the basic gist here is that you need to know whether you’re asking about gender, sex, or both. This is not an easy either/or answer! For one thing, both gender and sex are complex, multi-faceted collections of traits, experiences, and life trajectories that people may experience variably over the course of their lives, or in different contexts.

For example, are you asking for “gender identity” because you think that someone’s identification with ideals of masculinity or femininity might influence their linguistic behaviors?

Or are you asking for “assigned sex at birth” because you think someone’s early gendered childhood socialization might have a lasting impact on their adult linguistic behaviors?

Or are you asking for “biological sex” because you think someone’s anatomy might have an effect on their voice?

Or are you asking for “gender and sex” because you think someone’s identification with transness might be a factor in their linguistic behaviors?

Those are all sort of sideways questions, and have the opportunity to get you skewed or obscured answers. Here’s my recommendation instead: ask the thing you actually want to know! For the above examples, respectively, I would recommend collecting information on:

  • your participants’ identification with ideals of masculinity and femininity
  • your participants’ experiences with gender in early childhood
  • a closer proxy for the anatomy of the vocal apparatus — such as height — or even actually measuring the vocal tract, if you can!
  • whether your participants have a sense of themselves as trans, or within the trans umbrella in some way

Shoot, isn’t that easier to just directly ask the thing you actually want to know? This is pretty much always my strong recommendation: ask the question you’re actually trying to ask, rather than coming at it from a roundabout way.

Why are you asking for this information?

Equally crucial to the research design is not just what you’re asking (about someone’s gender or sex) but why you’re asking it. To take an example from above, are you asking for someone’s “biological sex” (an imprecise and unhelpful term!) because you think it might correlate with creaky voice?

Okay, sure, but be ready to explain why you think those are correlated. If your answer is about the shape and size of the actual vocal tract, then you should ask about the actual vocal tract. If your hypothesis is that there is some relationship between creaky voice and genital configuration I’d ask what the hell you’d been smoking — there is NO reason to think that these things would have a direct causal relationship.

Being explicit and completely honest about your research question, hypothesis, and proposal for why those things might be correlated will help you to collect the information you actually need.

The other thing that often comes up, however, is some version of “Well, I don’t have a specific hypothesis about sex/gender, but I just need it for basic demographic information, just in case there IS some obvious difference that I didn’t think to look for.”

That’s fair! Here’s my recommendations for “IDK, Just Checking” gender/sex questions:

  • Engage in good research practices! Make a stats plan ahead of time! Do Bonferroni corrections! Consider pre-registering or at least publishing a pre-data collection report on an archive site! Don’t p-hack or HARK!
  • Keep the questions as short and unintrusive as possible. Below I have recommendations about how to tell your participants what you’re asking and why.
  • Consider, rather than asking for “sex,” just ask whether someone would like to be identified as trans (or within the trans umbrella) for the purposes of your study. In my next section I have some suggestions on how to word this in a way that doesn’t suck. One reason to consider including this question in basic demographic surveys is, basically: would you want to know if all your participants were cis? I would.
  • Consider including a write-in option. Yes, I know, this makes certain quantitative analysis harder. Incredibly, I totally also have ideas below for how to deal with that. You’re welcome :)

If you’re NOT collecting other demographic information, and your hypothesis doesn’t relate to gender or sex at all, don’t ask.

What are the unwritten implications of how you’re asking?

It is awfully tempting to make this section into a “Wall of Shame” section, showing really bad examples of how NOT to ask about sex/gender. I have so many screenshots. People send them to me. However, in the interest of actually being constructive, instead I’ll just speak to some very common mistakes:

  • “Man / woman / transgender” this option sucks because it implies that “transgender” is a third gender that is distinct from men and women. That’s not how trans people understand ourselves, or how we operate in the world. Trans men are men and almost always move through the world as men; any differences between trans men and cis men are not due to their not-really-man-ness, but due to structural transphobia. Ditto for trans women, who are women and move through the world as women. Furthermore, not all nonbinary people (whom you might be generously trying to “include” with this phrasing) are/understand themselves as trans?
  • “Man/woman/prefer not to say” — this option sucks because it lumps in people who don’t want to share data with you (e.g. a cis woman who chooses prefer not to say because she worries about sexist bias in your design) with people who DO want to share data with you, but it’s data other than man or woman. You’re just, like, throwing free data in the garbage can! This is both unwise of you, and insulting to the people who are sharing their time with you.
  • “Male/female/other” — this option sucks because 1.) using male and female when you’re asking about social identity ends up feeling confusing for participants. I’ve heard from friends that they can’t tell if you’re asking their ASAB or gender — that means you’re getting messy, conflated, or confounded data; 2.) again, the implication of the other category (without the opportunity to provide further detail) ends up collapsing nonbinary and other trans people in a way that is really confusing for participants as well as sort of nonsensical in how you’re forced to interpret the data.
  • “Biological sex” — don’t ask this. If you’re trying to ask about a possible sex-based trait, ask about that trait.
  • What do you identify as?” — this type of phrasing ends up implying that people aren’t really the gender they say they are. Don’t do that.
  • “Transgender man/man/transgender woman/woman” — do not do this shit. As above, this implies that transgender men aren’t men, ditto trans women. Any variation of this set of options is a no-go. Split it into two questions if you really want to know if someone’s trans!

Basically, the general rule you should follow is to arrange your questions (and array of possible answers, if that’s the format of your survey) with the assumptions that 1.) not everyone is a man or a woman, 2.) transgender people who say they are men and women are men and women, and 3.) people declining to answer is not the same as people trying to give you an answer you didn’t account for.

Anyways just for fun, here is ONE funny screenshot of a wacky gender option, so we can all feel buoyed by its absurdity or whatnot.

(I presume that Unk was a shortened form of unknown, but also hi my gender is now unk.)

How does this serve the people who are giving their time to your research?

I intentionally am not phrasing this with words like “the community” or “participants” for this section, because I really want you to think of everyone who takes part in your study as a person who is generously donating their time and sometimes personal information to help you, out of the sheer goodness of their heart. You OWE them the following:

  • appropriate compensation for their time
  • transparency about how you are using this information
  • respectful communications that never misgender or otherwise harm them, during or after the study, implicitly or explicitly
  • responsible and thoughtful use of the information in your analysis, writeups, and public communications about the research.

In order to appropriately compensate people for their time, you should try to pay people. It’s hard to get funding, though, I know this firsthand! If you aren’t paying people for spending their time explaining stuff to you, your study had better be quick and painless — I’m talking 5 or so minutes, max.

In regards to transparency about how you’re using this information, I find that linguists in particular can sometimes be a little bit cagey about this one. I find that instinct in my own research design! The thing is, we’re often trying to get at peoples’ unconscious thoughts or reactions, and trying to evade the observer’s paradox. However, transgender status and sex-related traits are pretty personal information, so it is necessary to be clear about why you’re asking the question. Here are my recommendations:

  • Put the demographics survey last. This is standard practice in some subfields but not others. I am a syntactician, so I’m often doing grammaticality judgment surveys. I put the demographics survey last to try and avoid priming people to think overly much about their gender identity when I mostly want them to be thinking about whether a relative clause sounds weird or whatever.
  • Include information pop-ups on demographic questions. You know how on many websites, a little question mark icon can show an information pop-up giving further info about something? This is good UI design because it’s common and people know about it — we may as well use it. Your (?) pop-up on a gender multiple choice question might say something like, “Why are we asking this? We ask demographics questions, including your gender, because different people sometimes use language differently. We are interested in whether gender influences the way you parse sentences of this kind. This question is asking for your current gender identity, not your assigned sex at birth or legal sex.”
  • Say it directly in the question. Rather than ask “what gender are you?” it can sometimes be more honest (and more useful!) to ask, “what gender category should we include your data in?” This is great when paired with an open-response question. (It would also be great to include a pop-up bubble saying “We are asking how to categorize you in addition to the answer you provided above, because we analyze data two ways: the unique answer you gave above will help us to understand your responses on an individual level, while your answer to this question will help us understand how populations act on a larger scale.”)

About communications that are respectful and appropriate, I hope this is self-explanatory: it is unethical to misgender people in your study. If people might get incidentally misgendered over the course of your study (in, say, a group setting where friends are talking) this needs to be part of your consent process. Honestly, email me if you want to talk more about this — I have had to rewrite a couple IRB applications around it, and would rather talk more extensively one-on-one.

In addition, it is unethical and harmful to implicitly misgender anyone who is taking part in your study. This includes a lot of the poorly-worded questions I discussed above.

The final point I gave is that you have an ethical obligation to the people who have helped you to not say or imply transphobic stuff or misgender them in print, in talks, or anywhere else. You are in fact responsible for the output of your scientific inquiry, and this means that you need to avoid speculating about causation, avoid over-hyping correlations, avoid reporting generalizations you don’t understand, and avoid talking about your research to the lay public in a way that is too easily misunderstood as supporting transphobic rhetoric.

OKAY. This got long (don’t they all?) and I want to just reiterate a few important points here!

TL;DR:

The way to avoid being transphobic or cissexist in your research design depends on what you’re doing! But there are a few general things you can do to think it through:

  • You need to be clear about what information you are actually asking for. Think carefully about whether your research question is about gender, or sex traits, or both. Think about whether you need to know whether or not your participants are trans.
  • Your research question and hypothesis should have a clear reason why you think this data is relevant. You should have done your homework on what causal relationships might exist between (linguistic behaviors) and (gender stuff).
  • You should design your questions in a way that doesn’t implicitly misgender people; you should work under the assumption that people are the gender they say they are.
  • You should plan your research in a way that 1.) does not harm the people who are giving their time and expertise to your study, either during or after they take part in it, and 2.) is appropriate and respectful of their time and expertise.

That’s it! As I said, I’m happy to answer specific questions on twitter — but often the answers will depend on what you’re trying to do, and how, and why.

Works related to this one:

This work is supported by my ko-fi tips. You can also follow me on twitter. This work is licensed under CC BY-SA 4.0.

--

--

Kirby Conrod

Dr. Conrod is a linguist and scholar sort of at large. They write about transgender stuff, the linguistics of pronouns, and ways to work with your brain.