AlphaFold outputs

AlphaFold and friends on the HPC AlphaFold on the HPC AlphaFold outputs

After submitting a job script, sit back and relax. Your job is put in a queue on the HPC, and will automatically run when it is your turn to have some resources allocated. After the run is finished, several outputs are returned.

In the alphafold/ directory, an output and an error file are created, named <job_name>.o<job_id> and <job_name>.o<job_id> (for instance, example_run.o1502930 and example_run.e1502930. Those contain some information about the run, such as the progress made so far, or warnings or errors. If no problems occurred during the job, you can safely remove those files, as the actual AlphaFold output files are located elsewhere.
In the alphafold/runs/ directory, a new directory is created with the FASTA name and the job id of the experiment. In here, the predicted structures and extra information are located.

Alphafold/runs/

In the runs directory, a new directory should be created for the job. Here, a copy of the FASTA file is found, as well as another subdirectory with AlphaFold outputs. Below, a summary is given of their contents. The different file extensions are as follows:

.pdb – protein database format. These files contain the actual structures, including the pLDDT (local confidence) in the b-factor column
.pkl – pickle data format, used in programming, but in a binary format an thus not readable from the command line
.json – readable data format, often used in programming languages, but contents can be shown with cat and other commands
msa files have .sto or .a3m file extensions, readable with cat and other commands

Note that by default, there are no image files attached in the output, for instance visualizing the pairwise PTM scores (global confidence).

The files that take up the most space are usually the MSA (depending on the actual MSA size, but this can take several GBs) and the .pkl data files (grows quadratically with sequence length, with more than 1 GB reached for sequences of more than ~800 amino acids long)

Visualization script

We provide a python script (visualize_alphafold_results.py) to extract pLDDT, PAE and MSA visualizations (inspired by ColabFold code). The script uses the contents of the .pkl output files from the AlphaFold run. It takes three parameters in input:

--input_dir The location where the AlphaFold output files were stored.
--output_dir (optional) The location where the images that are generated should be stored. By default, they are stored in the input directory.
--name (optional) The prefix that will be used in for the filenames of the generated files. By default, no prefix is added.

To run the script, you will also have the correct python modules loaded. You can do this by running the following lines before running the actual script.

module load AlphaFold/2.3.1-foss-2022a module load matplotlib/3.5.2-foss-2022a

Important: since the maintenance of the HPC cluster in December 2022, it is for the time being not possible to run this script when the joltik module is loaded in (an “Illegal instruction” error is produced). To circumvent this error, run the script from a fresh ssh session, from the login node (doduo).

For AlphaFold version 2.2 and earlier: Different modules need to be loaded here:

module load Python/3.8.6-GCCcore-10.2.0
module load matplotlib/3.3.3-foss-2020b

A full example is shown below:

prefix_coverage_LDDT.png

prefix_PAE.png

A nice visual tutorial on how to interpret the PAE can be found at the AlphaFold Protein Structure Database website, at any predicted structure, at https://alphafold.ebi.ac.uk/entry/Q9Y223, at the section “Predicted aligned error tutorial”.

Previous Topic

Back to Lesson

Next Topic

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.