Valve manual

Valve application is a driver that uses aquaduct module to perform analysis of trajectories of selected residues in Molecular Dynamics simulation.

Valve invocation

Once aquaduct module is installed (see Aqua-Duct installation guide) properly on the machine, Valve is available as valve.py command line tool.

Usage

Basic help of Valve usage can be displayed by following command:

valve.py --help

It should display following information:

usage: valve.py [-h] [--debug] [--debug-file DEBUG_FILE]
                [--dump-template-config] [-t THREADS] [-c CONFIG_FILE] [--sps]
                [--max-frame MAX_FRAME] [--min-frame MIN_FRAME]
                [--step-frame STEP_FRAME] [--sandwich] [--cache-dir CACHEDIR]
                [--cache-mem] [--version] [--license]

Valve, Aquaduct driver

optional arguments:
  -h, --help            show this help message and exit
  --debug               Prints debug info. (default: False)
  --debug-file DEBUG_FILE
                        Debug log file. (default: None)
  --dump-template-config
                        Dumps template config file. Suppress all other output
                        or actions. (default: False)
  -t THREADS            Limit Aqua-Duct calculations to given number of
                        threads. (default: None)
  -c CONFIG_FILE        Config file filename. (default: None)
  --sps                 Use single precision to store data. (default: False)
  --max-frame MAX_FRAME
                        Maximal number of frame. (default: None)
  --min-frame MIN_FRAME
                        Minimal number of frame. (default: None)
  --step-frame STEP_FRAME
                        Frames step. (default: None)
  --sandwich            Sandwich mode for multiple trajectories. (default:
                        False)
  --cache-dir CACHEDIR  Directory for coordinates caching. (default: None)
  --cache-mem           Switch on memory caching. (default: False)
  --version             Prints versions and exits. (default: False)
  --license             Prints short license info and exits. (default: False)

Configuration file template

Configuration file used by Valve is of moderate length and complexity. It can be easily prepared with a template file that can be printed by Valve. Use following command to print configuration file template on the screen:

valve.py --dump-template-config

Configuration file template can also be easily saved in to a file with:

valve.py --dump-template-config > config.txt

Where config.txt is a configuration file template.

For detailed description of configuration file and available options see Configuration file options.

Valve calculation run

Once configuration file is ready Valve calculations can be run with a following simple command:

valve.py -c config.txt

Some of Valve calculations can be run in parallel. By default all available CPU cores are used. This is not always desired - limitation of used CPU cores can be done with -t option which limits number of concurrent threads used by Valve. If it equals 1 no parallelism is used.

Note

Specifying number of threads greater then available CPU cores is generally not optimal.

However, in order to maximize usage of available CPU power it is recommended to set it as number of cores + 1. The reason is that Valve uses one thread for the main process and the excess over one for processes for parallel calculations. When parallel calculations are executed the main thread waits for results.

Note

Options --min-frame, --max-frame, and --step-frame can be used to limit calculations to specific part of trajectory. For example, to run calculations for 1000 frames starting from frame 5000 use following options: --min-frame 4999 --max-frame 5999; to run calculations for every 5th frame use: --step-frame 5.

Single precision storage

Most of the calculation is Valve is performed by NumPy. By default, NumPy uses double precision floats. Valve does not change this behavior but has special option --sps which forces to store all data (both internal data stored in RAM and on the disk) in single precision. This spare a lot of RAM and is recommended what you perform calculation for long trajectories and you have limited amount of RAM.

Cache

Storage of coordinates for all paths for very long MD trajectories requires huge amount of RAM. User can decide whether aquaduct should store coordinates in memory or in separated directory. Option --cache-mem instruct Valve to store coordinates in RAM; --cache-dir stores coordinates in selected directory. If neither of both options is selected, coordinates are calculated on demand.

Note

If no cache is used (memory or dir) Master paths cannot be calculated.

Sandwich

Trajectory data can be provided as several files. By default these files are processed in sequential manner making one long trajectory. If option --sandwich is used trajectory files are read as layers. For each layer, search of traceable residues is done separately (stage I and II) but processing and analysis (stage III, IV, V, and VI) are done for all paths simultaneously. Usage of --sandwich option is further referenced as sandwich mode.

Debugging

Valve can output some debug information. Use --debug to see all debug information on the screen or use --debug-file with some file name to dump all debug messages to the given file. Beside debug messages standard messages will be saved in the file as well.

How does Valve work

Application starts with parsing input options. If --help or --dump-template-config options are used appropriate messages are printed on the screen and Valve quits with signal 0.

Note

In current version Valve does not check the validity of the config file.

If config file is provided (option -c) Valve parse it quickly and regular calculations starts according to its content. Calculations performed by Valve are done in six stages described in the next sections.

Traceable residues

In the first stage of calculation Valve finds all residues that should be traced and appends them to the list of traceable residues. It is done in a loop over all frames. In each frame residues of interest are searched and appended to the list but only if they are not already present on the list. In sandwich_option mode this is repeated for each layer.

The search of traceable residues is done according to user provided specifications. Two requirements have to be met to append residue to the list:

  1. The residue has to be found according to the object definition.
  2. The residue has to be within the scope of interest.

The object definition encompasses usually the active site of the protein (or other region of interest of macromolecule in question). The scope of interest defines, on the other hand, the boundaries in which residues are traced and is usually defined as protein.

Since aquaduct in its current version uses MDAnalysis Python module for reading, parsing and searching of MD trajectory data, definitions of object and scope have to be given as its Selection Commands.

Object definition

Object definition has to comprise of two elements:

  1. It has to define residues to trace.
  2. It has to define spatial boundaries of the object site.

For example, proper object definition could be following:

(resname WAT) and (sphzone 6.0 (resnum 99 or resnum 147))

It defines WAT as residues that should be traced and defines spatial constrains of the object site as spherical zone within 6 Angstroms of the center of masses of residues with number 99 and 147.

Scope definition

Scope can be defined in two ways: as object but with broader boundaries or as the convex hull of selected molecular object.

In the first case definition is very similar to object and it has to follow the same limitations. For example, proper scope definition could be following:

resname WAT and around 2.0 protein

It consequently has to define WAT as residues of interest and defines spatial constrains: all WAT residues that are within 2 Angstroms of the protein.

If the scope is defined as the convex hull of selected molecular object (which is recommended), the definition itself have to comprise of this molecular object only, for example protein. In that case the scope is interpreted as the interior of the convex hull of atoms from the definition. Therefore, traceable residues would be in the scope only if they are within the convex hull of atoms of protein.

Convex hulls of macromolecule atoms

AQ uses quickhull algorithm for convex hulls calculations (via SciPy class scipy.spatial.ConvexHull, see also http://www.qhull.org/ and original publication The quickhull algorithm for convex hulls).

Convex hull concept is used to check if traced molecules are inside of the macromolecule. Convex hull can be considered as rough approximation of molecular surface. Following picture shows schematic comparison of convex hull and solvent excluded surface (SES):

../_images/ch_vs_ses.png

Convex hull (red shape) of atoms (blue dots with VdW spheres) and SES (blue line): a) convex hull and SES cover roughly the same area, convex hull approximates SES; b) movement of one atom dramatically changes SES, however, interior of the molecule as approximated by convex hull remains stable.

No doubts, convex hull is a very rough approximation of SES. It has, however, one very important property when it is used to approximate interior of molecules: its interior does not considerably depend on the molecular conformation of a molecule (or molecular entity) in question.

Raw paths

The second stage of calculations uses the list of all traceable residues from the first stage and for each residue in each frame two checks are performed:

  1. Is the residue in the scope (this is always calculated according to the scope definition).
  2. Is the residue in the object. This information is partially calculated in the first stage and can be reused in the second. However, it is also possible to recalculate this data according to the new object definition.

For each of the traceable residues a special Path object is created which stores frames in which a residue is in scope or in object.

Note

Residue is in object only if it is also in scope.

Separate paths

The third stage uses collection of Path objects to create Separate Path objects. Each Path comprise data for one residue. It may happen that the residue enters and leaves the scope and the object many times over the entire MD. Each such event is considered by Valve as a separate path.

There are two types of Separate Paths:

  • Object Paths
  • Passing Paths

Object Paths are traces of molecules that visited Object area. Passing Paths are traces of molecules that entered Scope but did not entered Object area.

Passing paths comprises of one part only. Each object path comprises of three parts:

  1. Incoming - Defined as a path that leads from the point in which residue enters the scope and enters the object for the first time.
  2. Object - Defined as a path that leads from the point in which residue enters the object for the first time and leaves it for the last time.
  3. Outgoing - Defined as a path that leads from the point in which residue leaves the object for the last time and leaves the scope.

It is also possible that incoming and/or outgoing part of the separate path is empty.

Note

Generation of Passing paths is optional and can be switched off.

Warning

Generation of Passing paths without redefinition of Object area in stage I and II may lead to false results.

Auto Barber

After the initial search of Separate Path objects it is possible to run Auto Barber procedure which trims paths down to the approximated surface of the macromolecule or other molecular entity defined by the user. This trimming is done by creating collection of spheres that have centers at the ends of paths and radii equal to the distance for the center to the nearest atom of user defined molecular entity. Next, parts of raw paths that are inside these spheres are removed and separate paths are recreated.

Auto Barber procedure has several options, for example:

  • auto_barber allows to define molecular entity which is used to calculate radii of spheres used for trimming raw paths.
  • auto_barber_mincut allows to define minimal radius of spheres. Spheres of radius smaller then this value are not used in trimming.
  • auto_barber_maxcut allows to define maximal radius of spheres. Spheres of radius greater then this value are not used in trimming.
  • auto_barber_tovdw if set to True radii of spheres are corrected (decreased) by Van der Waals radius of the closest atom.

See also options of separate_paths stage.

Smoothing

Separate paths can be optionally smoothed. Current aquaduct version allows perform soft smoothing only, i.e. smoothing is used only for visualization purposes. Raw paths cannot be replaced by the smoothed.

Available methods

Aqua-Duct implements several smoothing methods:

  1. Savitzky-Golay filter - SavgolSmooth - see also original publication Smoothing and Differentiation of Data by Simplified Least Squares Procedures (doi:10.1021/ac60214a047).
  2. Window smoothing - WindowSmooth
  3. Distance Window smoothing - DistanceWindowSmooth
  4. Active Window smoothing - ActiveWindowSmooth
  5. Max Step smoothing - MaxStepSmooth
  6. Window over Max Step smoothing - WindowOverMaxStepSmooth
  7. Distance Window over Max Step smoothing - DistanceWindowOverMaxStepSmooth
  8. Active Window over Max Step smoothing - ActiveWindowOverMaxStepSmooth

For detailed information on available configuration options see configuration file smooth section description.

Clusterization of inlets

Each of the separate paths has beginning and end. If they are at the boundaries of the scope they are considered as Inlets, i.e. points that mark where the traceable residues enters or leaves the scope. Clusters of inlets, on the other hand, mark endings of tunnels or ways in the system which was simulated in the MD.

Clusterization of inlets is performed in following steps:

  1. Initial clusterization: All inlets are submitted to selected clusterization method and depending on the method and settings, some of the inlets might not be arranged to any cluster and are considered as outliers.

  2. [Optional] Outliers detection: Arrangement of inlets to clusters is sometimes far from optimal. In this step, inlets that do not fit to cluster are detected and annotated as outliers. This step can be executed in two modes:

    1. Automatic mode: Inlet is considered to be an outlier if its distance from the centroid is greater than mean distance + 4 * standard deviation of all distances within the cluster.
    2. Defined threshold: Inlet is considered to be an outlier if its minimal distance from any other point in the cluster is greater than the threshold.
  3. [Optional] Reclusterization of outliers: It may happen that the outliers form actually clusters but it was not recognized in initial clusterization. In this step clusterization is executed for outliers only and found clusters are appended to the clusters identified in the first step. Rest of the inlets are marked as outliers.

Potentially recursive clusterization

Both Initial clusterization and Reclustarization can be run in a recursive manner. If in the appropriate sections defining clusterization methods option recursive_clusterization is used appropriate method is run for each cluster separately. Clusters of specific size can be excluded from recursive clusterization (option recursive_threshold). It is also possible to limit maximal number of recursive levels - option max_level.

For additional information see clusterization sections options.

Available methods

Aqua-Duct implements several clustering methods. The recommended method is barber method which bases on Auto Barber procedure. Rest of the methods are implemented with sklearn.cluster module:

  1. aquaduct.geom.cluster.BarberCluster - default for Initial clusterization. It gives excellent results. For more information see barber clusterization method description.
  2. MeanShift - see also original publication Mean shift: a robust approach toward feature space analysis (doi:10.1109/34.1000236).
  3. DBSCAN - default for Reclusterization of outliers, see also original publication A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise
  4. AffinityPropagation - see also original publication Clustering by Passing Messages Between Data Points (doi:10.1126/science.1136800)
  5. KMeans - see also k-means++: The advantages of careful seeding, Arthur, David, and Sergei Vassilvitskii in Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, Society for Industrial and Applied Mathematics (2007), pages 1027-1035.
  6. Birch - see also Tian Zhang, Raghu Ramakrishnan, Maron Livny BIRCH: An efficient data clustering method for large databases and Roberto Perdisci JBirch - Java implementation of BIRCH clustering algorithm.

For additional information see clusterization sections options.

Master paths

At the end of clusterization stage it is possible to run procedure for master path generation. First, separate paths are grouped according to clusters. Paths that begin and end in particular clusters are grouped together. Next, for each group a master path (i.e., average path) is generated in following steps:

  1. First, length of master path is determined. Lengths of each parts (incoming, object, outgoing) for each separate paths are normalized with bias towards longest paths. These normalized lengths are then used for as weights in averaging not normalized lengths. Values for all parts are summed and resulting value is the desired length of master path.

  2. All separate paths are divided into chunks. Number of chunks is equal to the desired length of master path calculated in the previous step. Lengths of separate paths can be quite diverse, therefore, for different paths chunks are of different lengths.

  3. For each chunk averaging procedure is run:

    1. Coordinates for all separate paths for given chunk are collected.
    2. Normalized lengths with bias toward longest paths for all separate paths for given chunk are collected.
    3. New coordinates are calculated as weighted average of collected coordinates. As weights collected normalized lengths are used.
    4. In addition width of chunk is calculated as a mean value of collected coordinates mutual distances.
    5. Type of chunk is calculated as probability (frequency) of being in the scope.
  4. Results for all chunks are collected, types probability are changed to types. All data is then used to create Master Path. If this fails no path is created.

More technical details on master path generation can be found in aquaduct.geom.master.CTypeSpathsCollection.get_master_path() method documentation.

Passing paths

If Passing paths are allowed (see allow_passing_paths option in separate_paths configuration) they will be generated using list of traceable residues from the first stage of calculations. In usual settings, where Object and Scope definitions are the same in both I and II stage, this will result in relatively low number of passing paths. In particular this will not show the real number of traced molecules that enter Scope during the simulation.

To get correct picture following options and settings have to be considered:

  • Stage traceable_residues
    • object should be broad enough to encompass all molecules that should be traced. For example, if water is traced, object definition could be following: resname WAT.
  • Stage raw_paths
    • In order to retain default Aqua-Duct behavior of tracing molecules that flow through Object area, it have to be redefined to encompass the active site only - see Object definition discussion.
    • clear_in_object_info should be set to True. Otherwise, traceable molecules will be limited according to current object definition but Object boundaries from traceable_residues stage will be used.
  • Stage separate_paths
    • allow_passing_paths should be set to True. This allows generation of passing paths.

Additionally, in stage inlets_clusterization following options could also be adjusted:

  • exclude_passing_in_clusterization could be set to True. This will exclude passing paths inlets from clusterization.
  • If passing paths are not clustered they will be added as outliers. Option add_passing_to_clusters allows to add some of passing paths inlets to already existing clusters. This is done by Auto Barber method and therefore this option should define molecular entity used in Auto Barber procedure, for example protein.

Analysis

Fifth stage of Valve calculations analyses results calculated in stages 1 to 4. Results of the analysis are displayed on the screen or can be saved to text file and comprise of several parts.

General summary

Results starts with general summary.

  • Title and data stamp.

  • [Optional] Dump of configuration options.

  • Frames window.

  • Names of traced molecules.

    Note

    If more than one name is on the list all consecutive sections of Analysis results are provided for each name separately and, as well as, for all names.

  • Number of traceable residues.

  • Number of separate paths.

  • Number of inlets.

  • Number of clusters.
    • Outliers flag, yes if they are present.
  • Clustering history - a tree summarizing calculated clusters.

Clusters statistics

  • Clusters summary - inlets.
    • Summary of inlets clusters. Table with 4 columns:
      1. Cluster: ID of the cluster. Outliers have 0.
      2. Size: Size of the cluster.
      3. INCOMING: Number of inlets corresponding to separate paths that enter the scope.
      4. OUTGOING: Number of inlets corresponding to separate paths that leave the scope.
  • Cluster statistics.
    • Probabilities of transfers. Table with 7 columns:
      1. Cluster: ID of the cluster. Outliers have 0.

      2. IN-OUT: Number of separate paths that both enter and leave the scope by this cluster.

      3. diff: Number of separate paths that:
        • Enter the scope by this cluster but leave the scope by another cluster, or
        • Enter the scope by another cluster but leave the scope by this cluster.
      4. N: Number of separate paths that:
        • Enter the scope by this cluster and stays in the object, or
        • Leaves the scope by this cluster after staying in the object.
      5. IN-OUT_prob: Probability of IN-OUT.

      6. diff_prob: Probability of diff.

      7. N_prob: Probability of N.

    • Mean lengths of transfers. Table with 8 columns:
      1. Cluster: ID of the cluster. Outliers have 0.
      2. X->Obj: Mean length of separate paths leading from this cluster to the object.
      3. Obj->X: Mean length of separate paths leading from the object to this cluster.
      4. p-value: p-value of ttest of comparing X->Obj and Obj->X.
      5. X->ObjMin: Minimal value of length of separate paths leading from this cluster to the object.
      6. X->ObjMinID: ID of separate path for which X->ObjMin was calculated.
      7. Obj->XMin: Minimal value of length of separate paths leading from the object to this cluster.
      8. Obj->XMinID: ID of separate path for which Obj->XMin was calculated.
    • Mean frames numbers of transfers. Table with 8 columns:
      1. Cluster: ID of the cluster. Outliers have 0.
      2. X->Obj: Mean number of frames of separate paths leading from this cluster to the object.
      3. Obj->X: Mean number of frames of separate paths leading from the object to this cluster.
      4. p-value: p-value of ttest of comparing X->Obj and Obj->X.
      5. X->ObjMin: Minimal value of number of frames of separate paths leading from this cluster to the object.
      6. X->ObjMinID: ID of separate path for which X->ObjMin was calculated.
      7. Obj->XMin: Minimal value of number of frames of separate paths leading from the object to this cluster.
      8. Obj->XMinID: ID of separate path for which Obj->XMin was calculated.

Note

Distributions of X->Obj and Obj->X might be not normal, ttest may result unrealistic values. This test will be changed in the future releases.

Clusters types statistics

  • Separate paths clusters types summary. Tables with 11 columns.
    • Mean length of paths:
      1. CType: Separate path Cluster Type.
      2. Size: Number of separate paths belonging to Cluster type.
      3. Size%: Percentage of Size relative to the total number of separate paths.
      4. Tot: Average total length of paths.
      5. TotStd: Standard deviation of Tot.
      6. Inp: Average length of incoming part of paths. If no incoming parts are available, NaN is printed (not a number).
      7. InpStd: Standard deviation of Inp.
      8. Obj: Average length of object part of paths. If no incoming parts are available, NaN is printed.
      9. ObjStd: Standard deviation of Inp.
      10. Out: Average length of outgoing part of paths. If no incoming parts are available, NaN is printed.
      11. OutStd: Standard deviation of Inp.
    • Mean number of frames:
      1. CType: Separate path Cluster Type.
      2. Size: Number of separate paths belonging to Cluster type.
      3. Size%: Percentage of Size relative to the total number of separate paths.
      4. Tot: Average total number of frames of paths.
      5. TotStd: Standard deviation of Tot.
      6. Inp: Average total number of incoming part of paths. If no incoming parts are available, NaN is printed (not a number).
      7. InpStd: Standard deviation of Inp.
      8. Obj: Average total number of object part of paths. If no incoming parts are available, NaN is printed.
      9. ObjStd: Standard deviation of Inp.
      10. Out: Average total number of outgoing part of paths. If no incoming parts are available, NaN is printed.
      11. OutStd: Standard deviation of Inp.
Cluster Type of separate path

Clusters types (or CType) is a mnemonic for separate paths that leads from one cluster to another, including paths that start/end in the same cluster or start/end in the Object area.

Each separate path has two ends: beginning and end. Both of them either belong to one of the clusters of inlets, or are among outliers, or are inside the scope. If an end belongs to one of the clusters (including outliers) it has ID of the cluster. If it is inside the scope it has special ID of N. Cluster type is an ID composed of IDs of both ends of separate path separated by colon charter.

All separate paths data

  • List of separate paths and their properties. Table with 20 columns.
    1. ID: - Separate path ID.
    2. RES: - Residue name.
    3. BeginF: Number of frame in which the path begins.
    4. InpF: Number of frame in which path begins Incoming part.
    5. ObjF: Number of frame in which path begins Object part.
    6. OutF: Number of frame in which path begins Outgoing part.
    7. EndF: Number of frame in which the path ends.
    8. TotL: Total length of path.
    9. InpL: Length of Incoming part. If no incoming part NaN is given.
    10. ObjL: Length of Object part.
    11. OutL: Length of Outgoing part. If no outgoing part NaN is given.
    12. TotS: Average step of full path.
    13. TotStdS: Standard deviation of TotS.
    14. InpS: Average step of Incoming part. If no incoming part NaN is given.
    15. InpStdS: Standard deviation of InpS.
    16. ObjS: Average step of Object part.
    17. ObjStdS: Standard deviation of ObjS.
    18. OutS: Average step of Outgoing part. If no outgoing part NaN is given.
    19. OutStdS: Standard deviation of OutS.
    20. CType: Cluster type of separate path.
Separate path ID

Separate Path IDs are composed of three numbers separated by colon. First number is the layer number, if no sandwich option is used it is set to 0. The second number is residue number. Third number is consecutive number of the separate path made by the residue. Numeration starts with 0.

Frames dependent analysis

In addition to general summary Aqua-Duct calculates frames dependent parameters. Two types of values are calculated: number of traced paths, and Object and Scope sizes. Results are saved in the additional CSV file or are printed on the screen.

Calculated numbers of traced paths can be used to visualize behavior of the system in question. For example, one can analyze number of paths is two different clusters:

../_images/figure_16_6.png

The above plot shows number of water molecules (or paths) in cluster 16 and 6 throughout the simulation. One can observe that number of molecules in cluster 6 diminishes approximately in the middle. This kind of plot can be easily generated with additional CSV data.

Number of traced paths

For each frame, numbers of traced paths are calculated for following categories:

  1. Name of traced molecules - amol is used for all possible names.
  2. Paths types (object for standard paths and passing for passing paths) - apaths is used for all possible paths types.
  3. Clusters and cluster types - aclusts is used for all possible clusters and actypes is used for all possible cluster types.
  4. Part of paths. Possible values are: walk, in, object, out, and in_out. Where walk corresponds to any part of path and in case of passing paths only this category is used; in, object, and out correspond to incoming, object, and outgoing parts; in_out corresponds to sum of incoming and outgoing parts.

All the above listed categories are combined together, and the final number of calculated categories may be quite big.

Size of Object and Scope

If option calculate_scope_object_size is set True and values of scope_chull and object_chull correspond to appropriate molecular entities, Aqua-Duct calculates area and volume of Scope and Object. Calculated sizes are estimates as resulting from convex hull approximations.

Visualization

Sixth stage of Valve calculations visualizes results calculated in stages 1 to 4. Visualization is done with PyMOL. Valve creates visualizations in two modes:

  1. Two files are created: special Python script and archive with data. Python script can be simply started with python, it automatically opens PyMol and loads all data from the archive. Optionally it can automatically save PyMol session.
  2. PyMol is automatically started and all data is loaded directly to PyMol workspace.

Molecule is loaded as PDB file. Other objects like Inlets clusters or paths are loaded as CGO objects.

Visualization script

By default Valve creates Python visualization script and archive with data files. This script is a regular Python script. It does not depends on AQUA-DUCT. To run it, python2.7 and PyMol is required. If no save option is used Valve saves visualization script as 6_visualize_results.py. To load full visualization call:

python 6_visualize_results.py --help

usage: 6_visualize_results.py [-h] [--save-session SESSION]
                              [--discard DISCARD] [--keep KEEP]
                              [--force-color FC] [--fast]

Aqua-Duct visualization script

optional arguments:
  -h, --help            show this help message and exit
  --save-session SESSION
                        Pymol session file name.
  --discard DISCARD     Objects to discard.
  --keep KEEP           Objects to keep.
  --force-color FC      Force specific color.
  --fast                Disable all objects while loading.

Option --save-session allows to save PyMol session file. Once visualization is loaded session is saved and PyMol closes. Option --fast increases slightly loading of objects.

Option --force-color allows to change default color of objects. It accepts list of specifications comprised of pairs ‘object name’ and ‘color name’. For example: 'scope_shape0 yellow cluster_1 blue'. This will color scope_shape0 object in yellow and cluster_1 in blue:

python 6_visualize_results.py --force-color 'scope_shape0 yellow cluster_1 blue'

Note

List of specifications has to be given in parentheses.

Note

List of specifications has to comprise of full objects’ names.

Note

Currently, --force-color does not allow to change color of molecules. It can be done in PyMol.

Options --keep and --discard allows to select specific objects for visualization. Both accept list of names comprising of full or partial object names. Option --keep instructs script to load only specified objects, whereas, --discard instructs to skip specific objects. For example to keep shapes of object and scope, molecule and clusters only one can call following:

python 6_visualize_results.py --keep 'shape molecule cluster'

To discard all raw paths:

python 6_visualize_results.py --discard 'raw'

Options can be used simultaneously, order does matter:

  1. If --keep is used first, objects are not displayed if they are not on the keep list. If they are on the list, visualization script checks if they are on the discard list. If yes, objects are not displayed.
  2. If --discard is used first, objects are not displayed if they are on the discard list and are not on the keep list.

For example, in order to display molecule, clusters, and only raw master paths, one can use following command:

python 6_visualize_results.py --keep 'molecule cluster master' --discard 'smooth'

Note

Options --keep and --discard accepts both full and partial object names.

Note

List of names has to be given in parentheses.

Visualization objects

Following is a list of objects created in PyMOL (all of them are optional). PyMOL object names given in bold text or short explanation is given.

  • Selected frame of the simulated system. Object name: molecule plus number of layer, if no sandwich option is used it becomes, by default, molecule0.
  • Approximate shapes of object and scope. Objects names object_shape and scope_shape plus number of layer, if no sandwich option is used 0 is added by default.
  • Inlets clusters, each cluster is a separate object. Object name: cluster_ followed by cluster annotation: outliers are annotated as out; regular clusters by ID.
  • List of cluster types, raw paths. Each cluster type is a separate object. Object name composed of cluster type (colon replaced by underline) plus _raw.
  • List of cluster types, smooth paths. Each cluster type is a separate object. Object name composed of cluster type (colon replaced by underline) plus _smooth.
  • All raw paths. They can be displayed as one object or separated in to Incoming, Object and Outgoing part. Object name: all_raw, or all_raw_in, all_raw_obj, and all_raw_out.
  • All raw paths inlets arrows. Object name: all_raw_paths_io.
  • All smooth paths. They can be displayed as one object or separated in to Incoming, Object and Outgoing part. Object name: all_smooth, or all_smooth_in, all_smooth_obj, and all_smooth_out.
  • All raw paths inlets arrows. Object name: all_raw_paths_io.
  • Raw paths displayed as separate objects or as one object with several states. Object name: raw_paths_ plus number of path or raw_paths if displayed as one object.
  • Smooth paths displayed as separate objects or as one object with several states. Object name: smooth_paths_ plus number of path or smooth_paths if displayed as one object.
  • Raw paths arrows displayed as separate objects or as one object with several states. Object name: raw_paths_io_ plus number of path or raw_paths_io if displayed as one object.
  • Smooth paths arrows displayed as separate objects or as one object with several states. Object name: smooth_paths_io_ plus number of path or smooth_paths_io if displayed as one object.

Color schemes

Inlets clusters are colored automatically. Outliers are gray.

Incoming parts of paths are red, Outgoing parts are blue. Object parts in case of smooth paths are green and in case of raw paths are green if residue is precisely in the object area or yellow if it leaved object area but it is not in the Outgoing part yet. Passing paths are displayed in grey.

Arrows are colored in accordance to the colors of paths.