FAQ
Please consult the PyRosetta GitHub discussions. for additional answers or to ask a new question.
General Questions
1. What if I have questions?
If you cannot find the answers to your questions here, please check the PyRosetta GitHub discussions. for solutions or post your own query. For more information on Rosetta's architecture, data structures, and custom file syntax, consult the user guide.
2. Is PyRosetta available for 64 bit Linux or Windows platforms?
PyRosetta now supports 64-bit Linux and Mac OS X platforms along with 32-bit Linux and Windows.
3. How do I get started?
Once you have obtained a license and downloaded (and unzipped) PyRosetta, the easiest way to get started is by going through the PyRosetta step-by-step tutorial. For a more interactive exposure to PyRosetta, please consult the sample scripts.
4. How do I cite PyRosetta?
We have recently published an Applications Note in the journal Bioinformatics on PyRosetta (advanced access available):
S. Chaudhury, S. Lyskov & J. J. Gray, "PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta," Bioinformatics, 26(5), 689-691 (2010). Online | PDF
5. What is the relationship between PyRosetta, Rosetta, and Robetta?
PyRosetta is a Python-script based front-end to the Rosetta molecular modeling suite. Rosetta, which is a collaborative project between more than 15 labs world-wide, requires users to have substantial experience in C++ and Rosetta software development to write custom algorithms. Through the use of Python-bindings to Rosetta C++ source code, PyRosetta allows the end-user to have access to the same Rosetta functions available to Rosetta developers, through an easy-to-use Python script based interface. Robetta is a server available online for non-commercial use of Rosetta applications. Rosetta algorithms cited in literature are the same (or similar) to the code implemented by Robetta and PyRosetta.
6. What is the relationship between The PyRosetta Toolkit GUI, PyRosetta, and PyMOL?
The PyRosetta Toolkit GUI-addon to PyRosetta is written in Python using the Tkinter GUI package, which is included in the standard python distribution. The program is a set of modules which rely on importing PyRosetta for interactive protein modelling, design, and analysis. It can be extended and customized like many of the scripts included in PyRosetta. PyMOL is a protein visualization package, independent of PyRosetta. The Toolkit GUI has functions to send information (structures, energies, hbonds, etc.) to opened PyMOL window through the PyMOL-PyRosetta server link.
The Toolkit is developed by Jared Adolf-Bryfogle from the Dunbrack Lab.
7. Does PyRosetta allow for parallel processing and high performance computing?
Yes, PyRosetta algorithms, like Rosetta, are easily scaled up to a large number of parallel processes. The JobDistributor and MPIJobDistributor objects in PyRosetta that makes it simple to scale up simulations for multiple processes.
8. Contact information?
If you have questions about how to install or use PyRosetta please use our PyRosetta GitHub discussions.
Basic Applications Questions
1. Are PyRosetta algorithms deterministic?
Generally, No! Rosetta is a suite of algorithms and protocols but the underlying algorithm for many protocols is called Markov Chain Monte Carlo (MCMC) and is stochastic (random). Abstractly, MCMC algorithms allow users to withdraw samples from a distribution without explicit knowledge of the entire distribution structure (though some knowledge about constraints on the distribution is required). Rosetta was constructed to perform structural changes and scoring efficiently to make individual trajectories computationally inexpensive. Thus, an individual trajectory, such as a single ab initio job, is not expected to yield a very good prediction. With enough trials (adequate sampling), one or more trials will yield very good estimates (low scoring) of the unknown structure.
As a stochastic algorithm, the real output (with adequate sampling) is a distribution of structures. However, the search space for nearly all protein conformation questions is too large to extract statistical characteristics easily. Many statistical techniques can be used to analyze the set of structures (decoys) output by a Rosetta protocol but none are universally applicable. Simplistically this means that each new trajectory has the chance of producing a low scoring structure (most likely native-like) and thus, more trajectories yield a greater chance of producing a realistic structure.
Usually 800 or more trajectories is enough to produce useful results although it depends on the application. When developing new algorithms, construct a benchmarking set of structure to compare with. As is typical with Bioinformatics software, some parameters of any algorithm with be tuned based on a trial set. This does not mean individual trials in PyRosetta are useless. At very least, the output are indicative of the sampling performed by the algorithm.
Several tools in PyRosetta are deterministic (such as minimization). When testing out new protocols or Movers, try and learn if the application has a lot of variability or performs small random changes.
2. How do I access the help system?
Help is accessed in PyRosetta with several similar looking commands:
help <object>
help( <object> )
<object>?
For example:
help Pose
help( MonteCarlo )
Residue?
p = Pose()
p?
These help messages are generated from comments inside the Rosetta source code. Some effort has been made to standardize and improve these help messages, however only a small portion of the essential objects and methods have in-depth help messages. The test output is a summary of an object's purpose or a method's function. iPython also supports tab-completion which is extremely useful for exploring object methods. Help can be accessed from the object name itself or from instances of the object (as in the example above).
3. What objects work?
PyRosetta contains nearly all of Rosetta. Various restrictions prevent some templated methods from becoming part of PyRosetta, but in almost every case, there is a workaround. Many Rosetta object may appear not to work, however they almost always work properly though they may have nuances to their usage or make inherent assumptions which the user does not want. Unfortunately, we cannot easily provide a list of methods that are working or not working. Fortunately, it is easy to test object functionality in PyRosetta. If the object requires a lot of setup (depends of other data structures), I recommend writing a short script to test out the object. Many errors in PyRosetta cause segmentation faults which end the IPython (or Python) interpreter session and hamper object testing. (This typically results from poor error-checking in the C++ code, so please feel free to submit a bug report when this happens.) From experience, typing "from rosetta import *" gets tedious fast, and you must know what an object or method does before using it.
The tutorials and sample scripts demonstrate what commonly used objects and methods work in PyRosetta and how to use them. For more information, consult the documentation.
4. How do I know if a method or object constructor is overloaded?
Please consult the documentation.
5. How do I search for new objects or methods?
The easiest way to find what is hidden in PyRosetta is to use tab-completion with the rosetta architecture. Reading are searching PyRosetta is the main topic of the documentation.
6. What is a "Size" or a "Real"?
Within Rosetta, several simple objects are used for basic data structures. If these are seen within PyRosetta help, they can be replaced by their appropriate Python data type.
Size in an int
Real is a double or float (use float in Python)
Vector or Vector1 often serves the purpose of a Python list
7. Where are Vector objects?
Within Rosetta, Vector objects are used for various list structures. The common Vector objects are found in various locations. Please consult the question below for more information.
Vector1
xyzVector
vector1_(data type)
rosetta.Vector1
rosetta.numeric.xyzVector
rosetta.utility.vector1_(data type)
8. Why are Rosetta objects 1-indexed?
Within Rosetta has its roots in FORTRAN so counting is "1-indexed" (the first element is numbered 1). Python on the other hand is "0-indexed" (the first element is numbered 0). The documentation discusses this in a little more depth.
9. Do other biological tools interact nicely with PyRosetta?
Yes! Very well! One advantage of having Rosetta accessible in Python is the ease of using other Bioinformatics software. Biological software is usually made public as a commandline executable or even within Python. Python serves as a wonderful language to "glue" other programs and processes together. Combining PyRosetta with other tools can greatly enhance analysis and only requires one (okay two, you should be able to get around in bash) language. Some Python programs or libraries frequently used with PyRosetta are:
matplotlib (and pylab)
9. Can I make a system call within the Python interpreter?
Yes, and there are a lot of ways to do this. Many biological tools are separate programs or executables and Python is a perfect place to stitch together programs that share Python or require system calls from the commandline. Biopython even has a set of Python objects for handling these system calls. The most basic method uses the os module and simply executes an input string from a subprocess it creates. The Python module subprocess has better tools for managing system calls. If you are interested in combining system calls with PyRosetta methods, I suggest becoming intimately familiar with the os module.
Example system call:
import os
os.system( 'mkdir pdb' )
Structure, PDB file, and Pose Questions
1. How do I create a molecule (Pose)?
The Pose object represents a single molecule within PyRosetta. These molecular objects can be created (see the questions below) or constructed from a PDB file. PyRosetta is intended to work with proteins but can successfully load other compounds (with additional work). Rosetta has numerous naming preferences but is capable of loading many (if not all) protein structures. The simplest method for creating a Pose object is to load it from a PDB file using the method pose_from_pdb.
pose = Pose()
pose_from_pdb( pose , 'your_favorite_protein.pdb' )
pose2 = pose_from_pdb( 'my_favorite_protein.pdb' ) # this method is overloaded in newer versions of Rosetta and returns a Pose
2. What is a Pose and why does it own so many other objects?
The Pose is a complex data structure with various objects. An abstract summary of the Pose structure is provided in the documentation. A more detailed and accurate description of the ose data structure is found in the paper A. Leaver-Fay et al., "ROSETTA3: An object-oriented software suite for the simulation and design of macromolecules," Methods in Enzymology 487, 548-574 (2011).
3. How do I make a crystal structure suitable for PyRosetta?
Missing atoms and other little errors are gracefully handled by PyRosetta. However, small clashes or other discrepancies can cause problems with some Rosetta protocols. Methods for converting a raw crystal structure into a Rosetta-solution-state structure are varied. For more information, please consul the documentation and the Rosetta user guide.
4. When I create a Pose, what does all the output mean?
Rosetta is currently very verbose. This is handy when debugging problems but can be jarring at first die to the shear volume of seemingly useless output. When loading a pose from a PDB file, output will indicate what atoms are missing (and thus idealized) and notify you of other problems. Generally, you can ignore this output even if it says there is a problem. If the PDB file is successfully read, you may want to check its sequence etc. or view it using the PyMOL_Mover but the protein has most likely been loaded without any trouble.
5. Can I split a Pose, delete residues, or insert residues?
Yes. The Pose object has methods for deleting and appending (inserting) residues. In addition, there are more functions and classes in the grafting namespace, that are not currently exposed. To use these, take a look at rosetta.protocols.grafting. These will need to be imported. Functions include: delete_region, replace_region, return_region, insert_pose_into_pose, and the AnchoredGraftMover class. All have descriptions that can be accessed through ipython. Note that, for many of these functions and the functions in the pose object, the Pose' PDBInfo object will go out of date once insertions and deletions are made. This means that you will loose your access to stored pdb information ( pdbinfo().pdb2pose, etc ), and when you dump the PDB it will have numbering starting from 1.
The information, however, is not lost. It is simply taged by Rosetta as obsolete. There are two ways to approach this if your original PDB information is needed. For deletions, simply use the function: pose.pdb_info().obsolete(false). For insertions, you will want to fix the PDBInfo object directly. There are a few ways to do this, and you can use iPython tab completion and documentation to get some ideas. The function pose.pdb_info().copy will work if your insertion has the numbering you need. If not, you can use the function pose.pdb_info().set_resinfo to manually set a particular residue's PDB information. Before accessing the PDBInfo object or dumping a pose, you will still want to un-obsolete the PDBInfo: pose.pdb_info().obsolete(false). In addition to PyRosetta you can perform many of these tasks through simple text editing software, or many molecular visualization tools. PyMOL and Biopython have numerous tools for editing PDB files that are very useful for aiding PyRosetta.
6. How do I create a Pose from a novel sequence?
The method pose_from_sequence can fill Pose objects with a single protein chain constructed from an input sequence of single letter amino acids.
pose = pose_from_sequence( pose , 'THANKSEVAN' , 'fa_standard' )
If using this method to produce a pose with non-protein residues (molecules), you can access these ResidueTypes using a single letter with its other identifier code in brackets. The single letter code and other identifiers can be found in the .params files within PyRosetta rosetta_database/chemical/residue_type_sets/. DNA residues are named this way (adenine is "A[ADE]", guanine is "G[GUA]", cytosine is "C[CYT]", thymine is "T[THY]"). Metals and other compounds are exposed by default.
pose = pose_from_sequence( pose , 'A[ADE]G[GUA]C[CYT]T[THY]Z[MN]' , 'fa_standard' ) # Z[MN] is Manganese
Different ResidueTypeSets than fullatom( "fa_standard") are available as long as the sequence is appropriate.
7. How do I change amino acids in a Pose?
There are several ways to change a protein at the sequence level using PyRosetta. You can (a) setup a PackerTask to redesign the protein with the desired sequence changes, (b) use the method mutate_residue, or (c) the MutateResidue Mover class.
a) Please consult the sample script packer_task.py for syntax on setting up a PackerTask to perform design manually or using a resfile. Since this option uses a Mover and allows multiple changes, it is the most efficient method of changing a protein's sequence.
b) PyRosetta 1.0 and 2.0 have an exposed method mutate_residue which accepts a pose, a residue number, and a single letter representation of the mutant amino acid. Please consult the sample script ala_scan.py or the tool script mutants.py for an updated version of the mutate_residue method. In current releases, this method is exposed by importing from the toolbox:
from toolbox import mutate_residue
This method is easy to use and best suited for investigations using the interpreter or sequence changes performed outside of a protocol. Additionally, a packing shell can be created around the residue to mutate.
c) The Mover MutateResidue performs the same change as the old mutate_residue method, and thus does not allow direct repacking of the sidechains near the mutant. Since this Mover's target residue and mutant identity must be set each time, it is the least efficient option.
8. How do I load in a PDB file containing DNA?
PyRosetta knows the typical deoxynucleic acids as the ResidueTypes ADE, GUA, CYT, and THY and can infer this from the single letters A, G, C, and T respectively if they are in the PDB resName column (characters 18-20). Please edit the PDB file to ensure that the nucleic acids are not represented with DA, DG, DC, or DT in the PDB resName column. Use PyMOL, grep, awk, Python, Biopython, or whatever technique you prefer. Soon we will provide a tool script for performing this edit.
Remember, for docking applications the downstream partner (later chain or chains) is docked to the upstream partner (first chain or chains) which is in a fixed position. Thus, for any DNA-protein docking application, it is most intuitive to dock the DNA to the protein requiring the DNA chains to be after the protein chains in the PDB file.
The sample script dna_interface.py also outlines and explains this process.
9. How do I load in a PDB file containing water?
Typically, you simply want to remove the water since PyRosetta does not use these for any application and they can cause problems. If you want to load water molecules into PyRosetta, you must activate them and edit the PDB file. PyRosetta knows water as the ResidueTypes TP3 and TP5 however these are "turned off" by default. To edit PyRosetta so that it will always know water HETATM lines, edit the file /minirosetta_database/chemical/residue_types/fa_standard/residue_type_sets.txt in the main PyRosetta directory by uncommenting (removing the "#" character) near line 75 which should look something like:
## Water Types
residue_types/water/TP3.params
residue_types/water/TP5.params
This file is the master list of fullatom ResidueTypes. As you probably guessed, the water .params files can be found in /minirosetta_database/chemical/residue_type_sets/fa_standard/residue_types/water.
To properly load the PDB file, you must also edit its water HETATM lines to have TP3 (or TP5) in the PDB resName column (characters 18-20). Usually a PDB file will have "HOH", "WAT", or something else crazy here.
10. How do I load in a PDB file containing other molecules?
11. How do I change Pose coordinates?
The Pose object has a .xyz method for extracting coordinates as xyzVector objects given the residue number and the atom number. The Residue objects also contain a .xyz method for extracting coordinates as xyzVector objects given an atom number. Residue objects also support a .set_xyz method for setting coordinates of an input atom number to an input xyzVector. This example extracts all of a pose's coordinates as a per-residue list of lists of atom xyzVector objects:
coords = [ [ pose.residue( r.xyz( a ) for a in range( 1 , pose.residue( r ).natoms() + 1 ) ] for r in range( 1 , pose.total_residue() + 1 ]
Similarly for setting a pose based on a similar list structure:
for r in range( len( coords ) ):
for a in range( len( coords[r] ) ):
pose.residue( r ).set_xyz( a , coords[r][a] )
There are many ways of extracting coordinate information. Please consult Workshop #2, sample script ala_scan.py, or tool script extract_coords_pose.py for more information.
12. Can I calculate a protein's Radius of Gyration from a Pose?
The method varies depending on the accuracy you desire. Rosetta is equipped with an rg ScoreType which rapidly approximates the Radius of Gyration using only the neighbor atoms (one representative atom per residue).
scorefxn = ScoreFunction()
scorefxn.set_weight( rg , 1 )
rad_g = scorefxn( pose )
For more accurate inquiries, Rosetta is an ideal tools for extracting atomic coordinates (in the example below, a Python list is produces from the Pose xyzVector objects). Remember that Pose objects contain hydrogen by default and scanning over the Pose will extract these coordinates. Full code for the calculation is not shown.
coordinates = []
for r in range( 1 , pose.total_residue() + 1 ):
r = pose.residue( r )
for a in range( 1 , r.natoms() ):
xyz = r.xyz( a )
coordinates.append( [ xyz[0] , xyz[1] , xyz[2] ] )
13. How do I align two Pose objects?
14. How do I calculate the RMSD between two Pose objects?
15. How do I extract secondary structure information?
16. How do I write PDB files?
17. Can I send structure directly to PyMOL without writing to files?
18. What is a ResidueTypeSet?
19. What is a FoldTree (or for that matter an Edge or a Jump object) ?
20. What is an "AtomID" object?
Scoring Questions
1. How do I create a ScoreFunction?
2. What are the scoring units?
3. How do I extract individual residue scores?
4. How do I extract individual atom scores?
5. What are the different ScoreFunctions?
6. How do I investigate ONLY the score between two residues?
7. How do I extract hydrogen bond information?
8. How do I know what a score term calculates?
9. How do I find new score terms?
10. Can I see scores in PyMOL?
Mover Questions
1. What is a Mover?
2. What Mover classes are available in PyRosetta?
3. What is the best way to create a sequence of Movers?
4. How do I know what structural changes are actually performed by a Mover?
5. Minimization increased the score (or moved docking partners war apart) ?
6. Packing doesn't always yield the same score (or rotamers) ?
Protocol Questions
1. Which of Rosetta protocols for modeling membrane proteins available in PyRosetta?
- RosettaMP is available in PyRosetta
- the ddG application is also accessible from PyRosetta, and there is a sample Python script that you can adapt.
- RosettaMPDock is now only available in the Rosetta C++
2. Is the Rosetta ab initio protocol available in PyRosetta?
3. Is Rosetta fragment selection/generation available in PyRosetta?
4. What is docking and what protocols best match my problem?
5. What is wrong with my docking?
6. How many trajectories should I run?
Rosetta to PyRosetta Transition Questions
1. How do I interact with the Rosetta Options system using PyRosetta?
The Rosetta commandline options are accessed during initialization of PyRosetta(init()). PyRosetta has default settings, some necessary and others recommended.Examples of setting options
To set PyRosetta initialization options:
1. import rosetta (from rosetta import *)
2. While calling rosetta.init(), pass all of the options as a string to the init() function.
For example:
rosetta.init( "-ex1 -ex2 -include_sugars -write_pdb_link_records" )
If you want to get or set global Rosetta options within PyRosetta after initialization, you can either call init again or use getter and setter functions for setting options.
Generally, the getter methods syntax is:
void rosetta.core.get_(data type)_option( <option name> )
The full list of getters (in newer versions of PyRosetta) are:
get_boolean_option
get_boolean_vector_option
get_file_option
get_file_option_option
get_integer_option
get_integer_vector_option
get_real_option
get_real_vector_option
get_string_option
get_string_vector_option
Generally, the setter methods syntax is:
void rosetta.core.set_(data type)_option( <option name> , <value> )
The full list of setters (in newer versions of PyRosetta) are:
set_boolean_option
set_boolean_vector_option
set_file_option
set_file_option_option
set_integer_option
set_integer_vector_option
set_real_option
set_real_vector_option
set_string_option
set_string_vector_option
If these methods do not work, you may need to import them (sorry, the code has changed versions and this is not the place to explain these decisions). Try:
rosetta.basic.options.get_(data type)_option( <option name> )
rosetta.basic.options.set_(data type)_option( <option name> , <value> )
or
rosetta.core.options.get_(data type)_option( <option name> )
rosetta.core.options.set_(data type)_option( <option name> , <value> )
or
import rosetta.basic.options
For example:
from rosetta import *
init()
rosetta.core.set_string_option( 'in:file:frag3' , <3-mer fragment filename> )
print rosetta.core.get_string_option( 'in:file:frag3' )
rosetta.sore.set_string_option( 'in:file:s' rosetta.Vector1( [ 'a' , 'b' ] ) )
print rosetta.core.get_string_option( 'in:file:s' )
This may cause errors if you have altered your environment path variables. For more information, see rosetta/__init__.py in the main PyRosetta directory.
2. How do I construct Rosetta Vector0/Vector1 objects?
Vector0/1 is exposed in newer versions of PyRosetta and lives in pyrosetta.rosetta.utility.vector{0/1}_*. Specific Vector1 objects live in rosetta.utility.vector1_type.
There is also pyrosetta.Vector1 helper function that will do construction of most common types using Python list as input. For or example:
print rosetta.Vector1( [ 1 , 2 , 3 ] )
print rosetta.Vector1( [ 1.0 , 2.0 , 3.0 ] )
print rosetta.Vector1( [ True , False , True ] )
print rosetta.Vector1( [ 'a' , 'b' , 'c' ] )
v = rosetta.utility.vector1_SSize()
v.append( 1 )
print v
3. How do I construct various C++ std objects, like std::map?
C++ sts:: types is exposed in PyRosetta in pyrosetta.rosetta.std module. For example all map types could be accessed as: pyrosetta.rosetta.std.map_type1_type2.
For example:
m = pyrosetta.rosetta.std.map_string_Real()
m['aaa'] = 1.0; m['bb']= 3.0
print m
4. How do I construct std::set objects?
std::set templates is exposed in PyRosetta and lives in pyrosetta.rosetta.std_*. There is also helper function Set that will convert Python list/set object into PyRosetta, it could be found in pyrosetta namespace:
print pyrosetta.Set( [ 1 , 2 , 3 ] )
print pyrosetta.Set( [ 1.0 , 2.0 , 3.0 ] )
print pyrosetta.Set( [ 'a' , 'b' , 'c' ] )
s = pyrosetta.utility.Set_SSize()
s.add(1); s.add(2); s.add(1); s.erase(2)
print s
5. How do I convert "AP" or "CAP" objects to regular class objects?
Use the "get" function (rosetta.utility.utility___getCAP in older releases). This is an involved issue which does not come up in common usage of PyRosetta.
For example, to create a "ALA" residue:
(new way)
chm = rosetta.core.chemical.ChemicalManager.get_instance()
rts = chm.residue_type_set( 'fa_standard' ).get()
ala = rosetta.core.conformation.ResidueFactory.create_residue( rts.name_map( 'ALA' ) )
print ala
(old way)
chm = rosetta.core.chemical.ChemicalManager.get_instance()
rts_AP = chm.residue_type_set( 'fa_standard' )
rts = rosetta.utility.utility___getCAP( rts_AP ) # converts a CAP object to a ResidueTypeSet object
ala = rosetta.core.conformation.ResidueFactory.create_residue( rts.name_map( 'ALA' ) )
print ala
6. How do I use std::ostream or std::istream for methods that require it?
The objects std::ostream and std::istream are bound in PyRosetta as pyrosetta.rosetta.std.ostream and pyrosetta.rosetta.std.istream, respectively. Generally, objects that require these objects will also accept classes that are pyrosetta.rosetta.std.istringstream and pyrosetta.rosetta.std.ostringstream objects. Use these types of objects instead.