Learn

How to Read Your 23andMe Raw Data

Your 23andMe raw data file contains hundreds of thousands of genetic variants — far more information than the standard reports show you. Here’s how to actually read it and what you can do with it.

Updated March 29, 2026 · 10 min read

If you’ve taken a 23andMe test, you probably explored your ancestry composition and maybe a handful of health reports. But sitting behind that polished interface is a plain text file containing your actual genotype data — roughly 600,000 to 700,000 individual data points about your DNA.

This raw data file is genuinely yours. You paid for it, you can download it, and you can use it far beyond what 23andMe’s own reports cover. The catch is that the file looks like gibberish if you don’t know what you’re looking at.

This guide walks you through exactly what that file contains, what every column means, and how to turn it into genuinely useful health information.

What 23andMe Raw Data Actually Is

When 23andMe processes your saliva sample, they don’t sequence your entire genome. Instead, they use a genotyping chip — a microarray that reads specific known positions across your DNA. Each position is a place in the human genome where people commonly differ from one another. These are called single nucleotide polymorphisms, or SNPs (pronounced “snips”).

Your raw data file is a tab-separated text file that lists every SNP the chip tested. Depending on which version of 23andMe’s chip was used for your sample (v3, v4, or v5), your file will contain somewhere between 550,000 and 700,000 lines, each one representing a single position in your genome.

The file starts with a few lines of comments (marked with #), then jumps straight into the data. Here’s what a few lines actually look like:

# rsid	chromosome	position	genotype
rs4477212	1	82154	AA
rs3094315	1	752566	AG
rs3131972	1	752721	GG
rs12124819	1	776546	AG
rs11240777	1	798959	AG
rs6681049	1	800007	CC

That’s it. Four columns, tab-separated, repeated hundreds of thousands of times. Every line is one genetic variant, and together they form a partial map of your DNA.

How to Download Your 23andMe Raw Data

23andMe lets you export your raw data at any time. The process is straightforward, though it includes a security verification step:

01

Log in to your 23andMe account

Go to 23andme.com and sign in with your email and password.

02

Open Settings

Click your name in the top-right corner and navigate to Settings.

03

Find the data download section

Scroll to "23andMe Data" and select "Download Raw Data."

04

Verify your identity

Re-enter your password and confirm via the email verification prompt. This is a security measure to protect your genetic data.

05

Submit and wait

Click "Submit Request." 23andMe will prepare your file and email you when it's ready, usually within a few minutes.

06

Download your file

Download the .txt or .zip file. This is your complete raw genotype data.

File size:The download is typically 15–25 MB as a zip, or around 30 MB uncompressed. You can open it in any text editor, though a spreadsheet application like Excel or Google Sheets makes it much easier to explore.

What Each Column Means

Your raw data file has exactly four columns. Understanding what each one represents is the key to making sense of everything else.

rsID — The Variant Identifier

The first column is the rsID (Reference SNP ID), like rs4477212. This is a unique label assigned by dbSNP, a public database maintained by the National Center for Biotechnology Information (NCBI). Think of it as a universal serial number for a specific position in the human genome.

Every rsID maps to one specific spot in your DNA. Researchers worldwide use these same identifiers, so when a study finds that rs1801133 is associated with folate metabolism, you can look up that exact rsID in your own raw data to see which variant you carry.

Chromosome — Where in Your Genome

The second column tells you which chromosome the variant sits on. Humans have 22 numbered chromosomes (1–22) plus the sex chromosomes (X and Y). You’ll also occasionally see MT for mitochondrial DNA, which is inherited exclusively from your mother.

This column is mainly useful for filtering. If you’re interested in a gene on chromosome 9, for example, you can filter the file down to just those rows. For most people, this column is context rather than something you act on directly.

Position — The Exact Coordinate

The third column is the base pair position— a number indicating the exact location on that chromosome, counting from the start. For example, position 752566 means 752,566 bases from the beginning of chromosome 1.

One important note: positions are relative to a reference genome build. 23andMe uses GRCh37 (also called hg19) for most of its data. If you compare your raw data to a database that uses the newer GRCh38 build, the position numbers may differ even though they refer to the same spot. The rsID stays consistent across builds, which is why it’s the more reliable identifier for looking things up.

Genotype — Your Actual DNA

The fourth column is the one that matters most. It contains your genotypeat that position — typically two letters, because you inherit one copy of each chromosome from each parent.

The letters represent the four DNA bases: A (adenine), T (thymine), C (cytosine), and G(guanine). Here’s what the common genotype patterns mean:

GenotypeNameWhat it means
AA, TT, CC, GGHomozygousBoth parents gave you the same base at this position.
AG, CT, AC, etc.HeterozygousEach parent contributed a different base. You carry one copy of each variant.
DD or --Deletion / no-callThe chip couldn't determine a result, or the position represents a deletion.
II, DI, or IInsertionExtra DNA bases are present at this position compared to the reference genome.

For the sex chromosomes, males will see single letters (one X and one Y), while females will see pairs (two X chromosomes). Mitochondrial DNA is also single-letter since it’s haploid — you only have one copy, inherited from your mother.

What You Can Learn Beyond the Standard Reports

23andMe’s consumer reports cover a curated selection of traits, carrier statuses, and ancestry markers. But your raw data contains far more information than those reports expose. Here are the major categories of insight that become available when you analyze the full file:

Polygenic Risk Scores

Most common diseases are influenced by hundreds or thousands of variants, each contributing a small amount of risk. A polygenic risk score aggregates these effects into a single number. Your raw data contains the variants needed to calculate scores for coronary artery disease, breast cancer, type 2 diabetes, and thousands more.

Pharmacogenomics

Variants in genes like CYP2D6, CYP2C19, and SLCO1B1 determine how you metabolize medications. A poor metabolizer of clopidogrel may not get adequate blood clot protection. An ultrarapid codeine metabolizer can experience dangerous side effects from a standard dose. Your raw data contains many of these variants.

Carrier Status

23andMe tests for carrier status on roughly 40-50 conditions, but your raw data often contains variants relevant to additional recessive conditions. Carrier screening is especially valuable for family planning — if both partners carry a variant for the same condition, each child has a 25% chance of being affected.

Nutrigenomics

Your file includes variants related to vitamin metabolism (MTHFR for folate processing), caffeine sensitivity (CYP1A2), lactose tolerance (MCM6), and dozens of other traits. While individual variants should be interpreted cautiously, they offer useful starting points for conversations with your doctor.

23andMe provides a few PRS-based reports and some pharmacogenomics data, but the published research covers thousands of additional traits and drug interactions. The information is already sitting in your file — you just need the right tools to extract it.

Going Further: Imputation and Large-Scale Analysis

Here’s where things get interesting. Your 23andMe chip tested roughly 600,000 positions, but the human genome has over 3 billion base pairs. The positions that were tested, however, are carefully chosen. Because nearby genetic variants tend to be inherited together in blocks (called linkage disequilibrium), the tested positions can be used to statistically infer millions of untested variants.

This process is called imputation, and it’s the same technique used in large-scale research studies like the UK Biobank. Tools like Beagle 5.5 compare your genotyped positions against reference panels of fully sequenced genomes and fill in the gaps with high statistical confidence.

Think of it this way:your genotyped positions are landmarks on a map. If you know someone is at the corner of 5th Avenue and 42nd Street, you can infer with high confidence which city block they’re on. Imputation works the same way — known variants serve as landmarks that let statistical models fill in the surrounding genetic landscape. Accuracy for common variants typically exceeds 99%.

At Helix Sequencing, we use this approach to expand your raw data from roughly 600,000 variants to over 28 million— a 40x increase in data density. This imputed dataset then feeds into 3,550+ polygenic risk score models drawn from the PGS Catalog, the largest peer-reviewed repository of published polygenic scores.

The result is a comprehensive risk profile covering cardiovascular disease, cancers, metabolic conditions, neurological disorders, autoimmune conditions, and much more — all from the same file you downloaded from 23andMe.

~600K
Variants on chip
28M+
After imputation
3,550+
PRS models scored

Step by Step: From Raw Data to a Full Report

If you want to go beyond what 23andMe shows you, here’s the process for getting a comprehensive genomic analysis with Helix Sequencing:

01

Download your raw data

Export your file from 23andMe using the steps above. Keep the .txt file somewhere safe.

02

Upload your file

Upload directly to Helix Sequencing. We accept the raw text file as-is — no format conversion needed.

03

Imputation runs automatically

Your 600K variants are expanded to 28M+ using Beagle 5.5 against the latest reference panels.

04

3,550+ risk scores are calculated

Each polygenic risk score is computed against peer-reviewed models from the PGS Catalog and placed in a population percentile.

05

You receive your full report

Conditions are categorized by risk level. High-risk findings are surfaced first so you can focus on what matters most.

06

Your DNA file is deleted

After analysis completes, your uploaded file is permanently removed. You receive a SHA-256 deletion certificate as proof.

The entire process takes about two hours from upload to report delivery. No appointments, no new saliva sample, no waiting weeks for lab work.

Not Just 23andMe: AncestryDNA, MyHeritage, and Others

While this guide focuses on 23andMe, the raw data format is similar across major consumer DNA testing services. If you tested with AncestryDNA, MyHeritage, Living DNA, or FamilyTreeDNA, your raw data file will contain the same types of information: rsIDs, chromosomes, positions, and genotypes.

The main differences between providers are which SNPs they test (the specific chip used) and minor formatting variations. Some files use commas instead of tabs, some include extra header information, and the number of variants tested ranges from about 600,000 to 750,000 depending on the chip version.

Helix Sequencing accepts raw data files from all major providers. Our pipeline auto-detects the file format and adjusts accordingly, so you can upload whichever test you’ve already taken without any manual conversion.

Tips for Exploring Your Raw Data Yourself

If you want to poke around in your raw data file before uploading it anywhere, here are a few practical tips:

  • Open it in a spreadsheet. Import the file into Excel or Google Sheets as tab-delimited. This lets you search, sort, and filter by chromosome or rsID far more easily than a text editor.
  • Search by rsID. If you read about a specific variant in a study or article, search your file for that rsID to see your genotype. For example, search for rs1801133 to check your MTHFR C677T status.
  • Don’t panic over a single variant.Most conditions are influenced by many variants, your environment, and your lifestyle. Individual SNP lookups are interesting but should always be interpreted in context — ideally with a healthcare provider or through a comprehensive polygenic risk score analysis that considers the full picture.
  • Keep your file secure. Your raw DNA data is sensitive personal information. Store it encrypted if possible, and be selective about which services you upload it to. Look for services that offer zero-retention policies and deletion certificates.

Why Any of This Matters

The promise of consumer genetics has always been slightly ahead of what the consumer reports actually deliver. 23andMe, Ancestry, and others do valuable work making DNA testing accessible, but their reports are necessarily conservative — limited by regulatory constraints and the need to present results that are easy for everyone to understand.

Your raw data has no such constraints. The same file that tells you whether you’re likely to sneeze in sunlight also contains the variants that peer-reviewed studies use to assess cardiovascular risk, cancer predisposition, and drug metabolism. The information is already there. You just need the right tools to read it.

Whether you analyze it yourself with public tools, use a service like Helix Sequencing for a comprehensive report, or bring it to a genetic counselor — downloading your raw data is always worth doing. It’s your DNA. You should have access to all of it.

Ready to see what your raw data really contains?

Upload your 23andMe, AncestryDNA, or MyHeritage raw data file. Deep imputation expands it to 28M+ variants, then 3,550+ peer-reviewed PRS models score your genetic risk across thousands of conditions.

Upload Your DNA File

No account required. Zero data retention. Your file is deleted after analysis.

Key Takeaways

Your 23andMe raw data is a tab-separated text file with four columns: rsID, chromosome, position, and genotype. Each line represents one genetic variant.

The rsID is a universal identifier for each SNP. The genotype (AA, AG, GG, etc.) tells you which DNA bases you inherited from each parent at that position.

The file contains 600,000-700,000 variants — far more data than what 23andMe's consumer reports cover.

With imputation, those variants can be expanded to 28M+, enabling comprehensive polygenic risk scoring across thousands of conditions.

Your raw data works with multiple analysis services, not just 23andMe. AncestryDNA, MyHeritage, and other files follow the same general structure.

Always keep your raw data file secure. It contains uniquely identifying genetic information.

Further Reading

Get Your Full Genetic Analysis

Upload your existing DNA file from 23andMe, AncestryDNA, or MyHeritage. Get 3,550+ polygenic risk scores, pharmacogenomics for 34 genes, and an AI-generated longevity protocol. Connect your genome to Claude or ChatGPT.

Analyze My DNA