Before you start: setup
This page gets your computer ready and walks you through your first GWAS QC command. By the end you will have: - Set up a portable project folder - Downloaded the demo dataset
- Installed the required tools (currently PLINK2, PLINK1.9, METAL, REGENIE, and R) - Run your first test script that confirms everything works
- A computer running Windows (with WSL2), macOS, or Linux
- 30-45 minutes to complete all setting steps
- Internet access to download tools and data
- optional Admin rights to install software (or ask your IT team for help)
You do not need any programming experience. Every command is provided in full with alternatives and troubleshooting information.
A note on the command line
Most steps in this guide are typed into a SHELL (also called a “command line” or “console”) — a window where you type a command, press Enter, and the program runs.
- Windows: You must use WSL2 terminal (see Step 0). WSL2 gives you Linux inside Windows.
- macOS: Open the Terminal app (in Applications → Utilities).
- Linux: Open your Terminal.
- Inside RStudio: there is a Terminal tab next to the Console tab — you can use that too.
Let’s practice using the terminal before we dive into the pipeline.
Throughout the guide, lines you type into the terminal look like the block below the this text. Try this small practice block. It creates a tiny throwaway folder and file, looks at it in a few different ways, then removes it.
Please click the copy button on the right side of the block to copy all the commands, paste them into your terminal, and press Enter to run them. Don’t worry if you don’t understand every command right now — just get a feel for how the terminal works. To paste into the terminal, you can usually right-click on the terminal, or use Ctrl + Shift + V on Windows/Linux, or Cmd + V on macOS.
echo "Welcome to the GWAS tutorial."
echo "1) Where am I?"
pwd
echo "2) What is in this folder?"
ls
echo "3) Make a practice folder."
mkdir -p gwas_practice
echo "4) Make a tiny practice file."
echo "GWAS practice line 1: samples" > gwas_practice/hello.txt
echo "GWAS practice line 2: variants" >> gwas_practice/hello.txt
echo "GWAS practice line 3: quality control" >> gwas_practice/hello.txt
echo "5) Print the whole file with cat."
cat gwas_practice/hello.txt
echo "6) Print the first two lines with head."
head -n 2 gwas_practice/hello.txt
echo "7) Print the last two lines with tail."
tail -n 2 gwas_practice/hello.txt
echo "8) Show the file with less."
less -F -X gwas_practice/hello.txt
echo "9) Remove the practice file and folder."
rm gwas_practice/hello.txt
rmdir gwas_practice
echo "Done. You just used pwd, ls, mkdir, cat, head, tail, less, rm, and rmdir."The commands you just ran do the following:
echoprints the text to the terminal. You can also use it to write text into files with>(create, overwrite) and>>(append, add end of file).pwdprints your current folder.lslists files in the current folder.mkdircreates folders. The -p argument allows you to create parent folders if they don’t exist.catprints a whole file.headandtailshow the beginning and end of a file.lessopens a file for reading. You need to pressqto exit from reading mode. push up and down arrows to scroll. Here we useless -F -X, so this tiny file is shown and the command returns automatically.rmremoves a file.rmdirremoves an empty folder.
In this exercise, rm only removes the practice file you just created, and rmdir removes the empty practice folder. This is a safe way to practice removing files without accidentally deleting important data. There is also #rm -rf, which can remove folders and files recursively and without asking questions. The # prefix indicates this is a commented-out command. Please be very careful with this command it will permanently delete data.
If you want to learn more about these commands, you can check their manual pages by typing man <command> (for example, man ls) in the terminal.
If you have any issues with these commands, don’t worry. This is just a practice block to get you familiar with the terminal. Basic Bash knowledge is not required, but it will help you feel more comfortable as you move through the pipeline. There are many online materials available; here is one recommendation.
If you do not know how to open the terminal, just follow the next instructions (Step 0) carefully and ask for help if you get stuck.
Step 0 — Prepare your operating system
Before you begin, set up your OS environment.
WSL2 (Windows Subsystem for Linux) gives you a Linux environment inside Windows. The GWAS pipeline uses bash scripts, so WSL2 is required.
Part A: Install WSL2
Open PowerShell as Administrator (right-click and select “Run as administrator”)
Run:
wsl --installRestart your computer when prompted
Part B: Open WSL2 Terminal
- After restart, open Windows Terminal (search for it in the Start menu)
- Click the dropdown arrow and select Ubuntu
- You should see a bash prompt like
user@computer:~$
Part C: Install Basic Tools
In the WSL2 bash terminal, run:
sudo apt-get update
sudo apt-get install git curl wget unzip✅ WSL2 is now ready. From here on, all commands use this bash terminal.
For help troubleshooting WSL2, see WSL Setup.
Your system already has bash. No setup needed for Step 0.
Step 1 — Create your project folder
Choose any location on your computer and create a folder named gwas_tutorial. This will be your working folder for the entire pipeline.
DIR="$HOME/gwas_tutorial"
mkdir -p "$DIR"
cd "$DIR"You can place this folder anywhere: Desktop, Documents, custom path — it doesn’t matter. Everything is portable because we use relative paths. DIR is a variable that holds the path to your project folder. $HOME means “home directory”, so this creates the folder in your home directory. You can also create it on other drives or directories by changing the DIR variable, for example DIR="/your/desired/directory".
Step 2 — Create the directory structure
Create the folder structure that keeps scripts, data, and tools organized.
First let’s check the project folder created successfully:
if [ -d "$DIR" ]; then
printf "%b\n"
printf "%b\n"
printf "%b\n" "✓ Project folder created successfully: $DIR"
printf "%b\n"
printf "%b\n"
printf "%b\n" "[NEXT] Stay in this folder for the next commands."
printf "%b\n"
printf "%b\n"
printf "%s\n\n" "Scripts, data, tools, and results will be stored here."
else
printf "%b\n"
printf "%b\n"
printf "%b\n" "✗ Failed to create project folder: $DIR"
printf "%b\n"
printf "%b\n"
printf "%b\n" "[Go to previous step] and make sure the command to create the folder ran successfully."
exit 1
fiIf the folder was created successfully, you should see a message like:
✓ Project folder created successfully: /home/user/gwas_tutorial [NEXT] Stay in this folder for the next commands. Scripts, data, tools, and results will be stored here.
If you see an error message instead, please go back to the previous step and make sure the command to create the folder ran successfully.
Next, create the subfolders for scripts, data, tools, and results. You can do this with one command or create them manually.
Download and run the setup script:
# Download the automated setup script
# delete later, note for developers
# use by branch for all downloads, I am keeping them as main for future: curl -fL https://raw.githubusercontent.com/mgentiluomo/how-to-gwas-pdac/murat_v1/scripts/dev/init_project.sh -o init_project.sh && bash init_project.sh
curl -L https://raw.githubusercontent.com/mgentiluomo/how-to-gwas-pdac/main/scripts/dev/init_project.sh -o init_project.sh
# Make it executable
chmod +x init_project.sh
# Run it
bash init_project.shThis creates all folders automatically.
If curl is not available on your system, use wget instead:
wget https://raw.githubusercontent.com/mgentiluomo/how-to-gwas-pdac/main/scripts/dev/init_project.sh -O init_project.sh
chmod +x init_project.sh
bash init_project.shCreate folders manually:
# Create all needed folders
mkdir -p \
scripts \
scripts/dev \
demo_data \
tools/bin \
data_processed \
results/{qc,pop_structure,imputation,association,finemapping,meta_analysis}After Step 3, your folder will look like this:
gwas_tutorial/
├── scripts/
│ ├── dev/ # Utility scripts
│ │ ├── download_demo_data.sh
│ │ ├── tools_setup.sh
│ │ ├── test.sh
│ │ ├── init_project.sh
│ │ ├── section_manifest.txt
│ │ └── script_manifest.txt
│ ├── 01B_genotyping_qc/ # QC scripts
│ │ ├── 01_initial_qc_stats.sh
│ │ ├── 02_sample_callrate.sh
│ │ └── ...09_qc_summary.sh
│ ├── 02_population_stratification/
│ ├── 03_imputation/
│ └── ...other sections...
├── demo_data/ # Demo dataset (Step 4)
├── tools/bin/ # Tools (eg PLINK, Step 5)
├── data_processed/ # Processed output files
└── results/ # Results organized by workflow
└── qc/ # QC output files
For a compact project-folder overview, see Setup Project Structure. If you are on Windows/WSL2, keep WSL Setup nearby for path and terminal troubleshooting.
Step 3 — Download the GWAS scripts
Clone the GitHub repo to get all the GWAS pipeline scripts organized by section:
# Clone the repo
# delete later, note for developers: git clone --branch murat_v1 --single-branch https://github.com/mgentiluomo/how-to-gwas-pdac.git
git clone https://github.com/mgentiluomo/how-to-gwas-pdac.git
# Copy utility/dev scripts
mkdir -p scripts/dev
find how-to-gwas-pdac/scripts/dev -maxdepth 1 -type f -exec cp {} scripts/dev/ \;
# Start fresh manifest files for the setup test
: > scripts/dev/section_manifest.txt
: > scripts/dev/script_manifest.txt
# Copy section scripts directly into section folders
for section_dir in how-to-gwas-pdac/sections/*/; do
section=$(basename "$section_dir")
if [ -d "$section_dir/scripts" ]; then
mkdir -p "scripts/$section"
echo "scripts/$section" >> scripts/dev/section_manifest.txt
find "$section_dir/scripts" -maxdepth 1 -type f -exec cp {} "scripts/$section/" \;
find "scripts/$section" -maxdepth 1 -type f | sort >> scripts/dev/script_manifest.txt
fi
done
# Add utility/dev scripts to the script manifest
find scripts/dev -maxdepth 1 -type f -name "*.sh" | sort >> scripts/dev/script_manifest.txt
sort -u scripts/dev/section_manifest.txt -o scripts/dev/section_manifest.txt
sort -u scripts/dev/script_manifest.txt -o scripts/dev/script_manifest.txt
# Optional cleanup: delete only the temporary cloned copy after scripts are copied
rm -r how-to-gwas-pdacIf you see rm: remove write-protected regular file 'how-to-gwas-pdac/.git/objects/pack/pack-XXX.rev'?, type y and press Enter to confirm deletion. This is normal because the cloned repo is write-protected.
Verify the scripts downloaded:
# Check which section folders were copied
cat scripts/dev/section_manifest.txt
# Check the currently available QC scripts
ls scripts/01B_genotyping_qc/At this stage, the section list should include scripts/01B_genotyping_qc, and you should see 01_initial_qc_stats.sh through 09_qc_summary.sh.
And utility scripts:
ls scripts/dev/You should see: download_demo_data.sh, tools_setup.sh, test.sh, init_project.sh, plus section_manifest.txt and script_manifest.txt.
Step 4 — Download the demo dataset
Download the 7 demo dataset files (~164 MB total) to your demo_data/ folder.
Run the download script from your project:
bash scripts/dev/download_demo_data.shThis downloads all 7 files automatically and verifies them. It uses curl, which is already available on most macOS and Linux systems.
Download files individually. If your system has curl but not wget, use the automated script instead.
cd ~/gwas_tutorial # Make sure you're in your project folder
# Download all 7 files
wget https://github.com/mgentiluomo/how-to-gwas-pdac/releases/download/v0.1-data/pdac_demo.bed -O demo_data/pdac_demo.bed
wget https://github.com/mgentiluomo/how-to-gwas-pdac/releases/download/v0.1-data/pdac_demo.bim -O demo_data/pdac_demo.bim
wget https://github.com/mgentiluomo/how-to-gwas-pdac/releases/download/v0.1-data/pdac_demo.fam -O demo_data/pdac_demo.fam
wget https://github.com/mgentiluomo/how-to-gwas-pdac/releases/download/v0.1-data/phenotype.txt -O demo_data/phenotype.txt
wget https://github.com/mgentiluomo/how-to-gwas-pdac/releases/download/v0.1-data/covariates.txt -O demo_data/covariates.txt
wget https://github.com/mgentiluomo/how-to-gwas-pdac/releases/download/v0.1-data/survival.txt -O demo_data/survival.txt
wget https://github.com/mgentiluomo/how-to-gwas-pdac/releases/download/v0.1-data/sample_ancestry.tsv -O demo_data/sample_ancestry.tsvVerify the files downloaded:
ls -lh demo_data/You should see all 7 files.
The download script automatically verifies the SHA256 hash of each file to ensure they were downloaded correctly. You should see messages like:
✓ OK: pdac_demo.bed
✓ OK: pdac_demo.bim
...
✓ All files verified successfully! (7/7)
If any file fails verification, the script will tell you which one and suggest re-downloading it. This built-in verification ensures your data is safe and ready for analysis.
Step 5 — Install dependencies
Install the required tools for the current tutorial sections. At this stage that means PLINK2, PLINK1.9, METAL, REGENIE, and R.
Before starting the install, check that you are in the project folder and that the basic download tools are visible:
cd "$HOME/gwas_tutorial"
pwd
ls scripts/dev/tools_setup.sh
command -v curl || echo "curl is missing"
command -v wget || echo "wget is missing"
command -v git || echo "git is missing"
command -v unzip || echo "unzip is missing"
command -v tar || echo "tar is missing"
command -v bzip2 || echo "bzip2 is missing"
R --version || echo "R is not installed yet"
curl -I https://github.com || echo "Internet connection check failed"If the internet check fails inside WSL but works in Windows, see WSL Setup for managed-network and proxy troubleshooting.
Run the setup script from your project:
bash scripts/dev/tools_setup.shThis script will: - Detect your OS and CPU architecture automatically (Linux x86_64/i686, macOS Intel/Apple Silicon) - Check whether R is available and try to install it if it is missing - Download the correct PLINK2 and PLINK1.9 binaries from official sources - Download the official precompiled METAL binary for Linux/WSL or macOS - Install micromamba into tools/micromamba/ - Install REGENIE into a micromamba environment named regenie_env - Extract and organize command-line tools in tools/plink2/, tools/plink1.9/, tools/metal/, tools/micromamba/, and tools/regenie/ - Create symlinks in tools/bin/ for easy access - Update your PATH so you can run plink2, plink, metal, and regenie from anywhere - Write scripts/dev/tool_manifest.tsv, which lists the tools checked by test.sh - Check for basic helper tools such as wget, unzip, tar, and bzip2
When finished, you’ll see:
Detected: OS=Linux, Architecture=x86_64_avx2
✓ R found
✓ PLINK2: PLINK v2.00a5 (64-bit build)
✓ PLINK1.9: PLINK v1.90b7.2
✓ METAL
✓ REGENIE: regenie v...
REGENIE is installed with project-local micromamba, following the official REGENIE conda install route. The environment files are stored under tools/micromamba-root/, so users do not need an existing conda installation.
The automated setup is recommended. Use this manual route only if you cannot use the setup script or need to inspect each tool installation step.
1. Install R
R is used later for QC plots and summary tables.
On WSL/Linux:
sudo apt-get update
sudo apt-get install -y r-base r-base-dev
R --versionOn macOS with Homebrew:
brew install r
R --version2. Download PLINK2
Visit: https://www.cog-genomics.org/plink/2.0/
Choose the build for your system: - Linux (Intel, AVX2): plink2_linux_avx2_*.zip - Linux (AMD, AVX2): plink2_linux_amd_avx2_*.zip - Linux (64-bit, no AVX2): plink2_linux_x86_64_*.zip - macOS (Intel, AVX2): plink2_mac_avx2_*.zip - macOS (Intel, no AVX2): plink2_mac_*.zip - macOS (Apple Silicon): plink2_mac_arm64_*.zip
Extract and move to your tools folder:
mkdir -p tools/plink2
cd tools/plink2
# Download your platform's ZIP file here
unzip plink2_*.zip
cd ../..3. Download PLINK1.9
Visit: https://www.cog-genomics.org/plink/1.9/
Choose the build for your system: - Linux (64-bit): plink_linux_x86_64_*.zip - macOS: plink_mac_*.zip
Extract and move to your tools folder:
mkdir -p tools/plink1.9
cd tools/plink1.9
# Download your platform's ZIP file here
unzip plink_*.zip
cd ../..4. Download METAL
Visit: https://csg.sph.umich.edu/abecasis/Metal/download/
Choose the precompiled binary for your system: - Linux/WSL: Linux-metal.tar.gz - macOS: Darwin-metal.tar.gz
Extract it in your tools folder:
mkdir -p tools/metal
cd tools/metal
# Download the Linux/WSL or macOS archive here
tar -xzf *-metal.tar.gz
find . -type f \( -name "metal" -o -name "METAL" \) -exec chmod +x {} \;
cd ../..5. Install micromamba and REGENIE
The automated setup installs micromamba for you. If you are doing this manually, first download micromamba.
For Linux/WSL x86_64:
mkdir -p tools/micromamba
cd tools/micromamba
curl -Ls https://micro.mamba.pm/api/micromamba/linux-64/latest | tar -xvj bin/micromamba
cd ../..For macOS Apple Silicon:
mkdir -p tools/micromamba
cd tools/micromamba
curl -Ls https://micro.mamba.pm/api/micromamba/osx-arm64/latest | tar -xvj bin/micromamba
cd ../..For macOS Intel, replace osx-arm64 with osx-64.
Then create the REGENIE environment:
export MAMBA_ROOT_PREFIX="$(pwd)/tools/micromamba-root"
tools/micromamba/bin/micromamba create -y -n regenie_env -c conda-forge -c bioconda regenieThen create a small wrapper so REGENIE works like the other tutorial tools:
mkdir -p tools/regenie
cat > tools/regenie/regenie <<'EOF'
#!/usr/bin/env bash
export MAMBA_ROOT_PREFIX="$(cd "$(dirname "${BASH_SOURCE[0]}")/../micromamba-root" && pwd)"
MICROMAMBA="$(cd "$(dirname "${BASH_SOURCE[0]}")/../micromamba/bin" && pwd)/micromamba"
exec "$MICROMAMBA" run -n regenie_env regenie "$@"
EOF
chmod +x tools/regenie/regenie6. Create symbolic links
Create convenient shortcuts in tools/bin/:
mkdir -p tools/bin
ln -s ../plink2/plink2 tools/bin/plink2
ln -s ../plink1.9/plink tools/bin/plink
METAL_BIN=$(find tools/metal -type f \( -name "metal" -o -name "METAL" \) | head -n 1)
ln -s "../../$METAL_BIN" tools/bin/metal
ln -s ../regenie/regenie tools/bin/regenie7. Update PATH
Add the tools folder to your PATH:
export PATH="$(cd tools/bin && pwd):$PATH"To make this permanent on Linux/WSL, add it to ~/.bashrc:
echo 'export PATH="$HOME/gwas_tutorial/tools/bin:$PATH"' >> ~/.bashrcOn macOS, the default shell is usually zsh, so use ~/.zshrc instead:
echo 'export PATH="$HOME/gwas_tutorial/tools/bin:$PATH"' >> ~/.zshrcThen reload your shell. Use the file you edited:
source ~/.bashrcFor macOS zsh:
source ~/.zshrcThe tool list used by the setup test is saved here:
cat scripts/dev/tool_manifest.tsvYou can also check the project-local tool links directly. These commands use ./tools/bin/..., so they work even if your PATH has not reloaded yet:
ls -l tools/bin/
./tools/bin/plink2 --version
./tools/bin/plink --version
test -x ./tools/bin/metal && echo "METAL executable found: ./tools/bin/metal"
./tools/bin/regenie --version
R --versionMost installation problems are caused by the shell not being in the right folder, internet/proxy access, an unfinished apt-get process, or R not being available yet. Start with the message printed in your terminal and match it to one of the cases below.
| Error or symptom | What it usually means | What to try |
|---|---|---|
scripts/dev/tools_setup.sh: No such file or directory |
You are not in the project folder, or Step 3 did not copy the scripts | Run cd "$HOME/gwas_tutorial" and ls scripts/dev/ |
curl, wget, or micromamba cannot connect |
WSL or your terminal cannot reach the internet | Test curl -I https://github.com; WSL users on managed networks should check WSL Setup |
Could not get lock /var/lib/apt/lists/lock |
Another apt-get process is running |
Wait, or inspect the process with ps -fp <PID> shown in the error |
Unable to locate package r-base |
Package lists are stale, universe is disabled, or the Ubuntu release is unusual |
See the R commands in the Manual tab; for persistent problems, use an Ubuntu LTS WSL release |
404: command not found after running a downloaded script |
A GitHub 404 page was saved instead of a script | Delete the file and repeat the current download command from this guide |
plink, plink2, metal, or regenie: command not found |
Tools were not installed or tools/bin is not on PATH |
Run bash scripts/dev/tools_setup.sh again, then check ls tools/bin/ |
| REGENIE or micromamba fails during environment creation | Usually a network, proxy, or conda-channel access problem | Confirm curl -I https://github.com works and retry bash scripts/dev/tools_setup.sh |
If automatic R installation fails because Ubuntu cannot find r-base, enable the universe repository and refresh the package list:
sudo apt-get install -y software-properties-common
sudo add-apt-repository -y universe
sudo apt-get update
sudo apt-get install -y r-base r-base-devThen run the tool setup again:
bash scripts/dev/tools_setup.shStep 6 — Test tools and scripts
Run the test script to verify everything is installed and working:
bash scripts/dev/test.shThis script will: 1. Check the project folders 2. Check the demo dataset files 3. Check that every copied script listed in scripts/dev/script_manifest.txt exists 4. Check every required tool listed in scripts/dev/tool_manifest.tsv 5. Print the next command to run
When it finishes, you should see output like:
✓ Initial test done! Folders, scripts, data, and required tools are working.
Next command to run the QC pipeline:
bash scripts/01B_genotyping_qc/01_initial_qc_stats.sh
If you see this message, everything is working! You can now go to Quality Control (Section 1B) and run your first QC command.
Troubleshooting
git: command not found
Git is not installed on your system.
Solution (Windows WSL2):
sudo apt-get update
sudo apt-get install gitSolution (macOS):
brew install gitSolution (Linux):
sudo apt-get install gitwget: command not found
wget is not installed.
Solution (Windows WSL2 / Linux):
sudo apt-get install wgetSolution (macOS):
brew install wgetplink, plink2, metal, or regenie: command not found
One of the installed tools is not found in your PATH.
Solution 1: Run the setup script again
The automated script should have configured PATH:
bash scripts/dev/tools_setup.shSolution 2: Manually add to PATH
If the script completed but PATH isn’t set, manually add it:
# Check if tools exist
ls tools/bin/plink
ls tools/bin/plink2
ls tools/bin/metal
ls tools/bin/regenie
# Add to current session
export PATH="$(cd tools/bin && pwd):$PATH"
# Verify
plink --version
plink2 --version
command -v metal && echo "metal is on PATH"
regenie --versionTo make this permanent on Linux/WSL:
echo 'export PATH="$HOME/gwas_tutorial/tools/bin:$PATH"' >> ~/.bashrc
source ~/.bashrcOn macOS zsh, use ~/.zshrc instead:
echo 'export PATH="$HOME/gwas_tutorial/tools/bin:$PATH"' >> ~/.zshrc
source ~/.zshrcSolution 3: Use full path
If PATH setup is tricky, use the full path directly:
./tools/bin/plink2 --version
./tools/bin/plink --version
test -x ./tools/bin/metal && echo "metal is available"
./tools/bin/regenie --versionError: Failed to open pdac_demo.bed
PLINK cannot find the demo dataset.
Solution: Make sure all demo files were downloaded:
ls demo_data/pdac_demo.*You should see .bed, .bim, and .fam files. If any are missing, re-run Step 4:
bash scripts/dev/download_demo_data.shPermission denied when running a script
The script file doesn’t have execute permissions.
Solution:
find scripts -type f -name "*.sh" -exec chmod +x {} \;
bash scripts/01B_genotyping_qc/01_initial_qc_stats.shYou’re in the wrong directory or files are missing.
Solution:
# Make sure you're in your project folder
cd ~/gwas_tutorial
# Check files exist
ls scripts/01B_genotyping_qc/
ls demo_data/
# Then run the first QC script
bash scripts/01B_genotyping_qc/01_initial_qc_stats.shAdditional Resources
- Setup Project Structure — Detailed setup guide
- WSL Setup — WSL2 troubleshooting
What’s next
You now have a working GWAS pipeline setup and have confirmed the tools runs successfully. Continue with the QC and other steps:
- Quality control (Section 1B) — cleaning the raw genotypes before any analysis.
- Population stratification (Section 2) — detecting and correcting for ancestry.