                           PDBPARSE documentation



CONTENTS

   1.0 SUMMARY
   2.0 INPUTS & OUTPUTS
   3.0 INPUT FILE FORMAT
   4.0 OUTPUT FILE FORMAT
   5.0 DATA FILES
   6.0 USAGE
   7.0 KNOWN BUGS & WARNINGS
   8.0 NOTES
   9.0 DESCRIPTION
   10.0 ALGORITHM
   11.0 RELATED APPLICATIONS
   12.0 DIAGNOSTIC ERROR MESSAGES
   13.0 AUTHORS
   14.0 REFERENCES

1.0 SUMMARY

   Parses PDB files and writes CCF files (clean coordinate files) for
   proteins. Parse PDB files and writes protein CCF files

2.0 INPUTS & OUTPUTS

   PDBPARSE parses every PDB file in a directory and writes a protein CCF
   file (clean coordinate file) for each one. The paths and extensions for
   the PDB files (input) and protein CCF files (output) files are
   specified by the user. The user specifies whether the output files have
   the same names as the input files or whether the PDB identifier codes
   (from the PDB files) are used to name the files.
   The parser generates a log file containing diagnostic messages for
   various types of inconsistency, error and other features of a PDB file
   that justify manual inspection of the file to verify its contents (see
   Section 12.0 below).
   PDBPARSE implement the parsing methodology described under 'ALGORITHM'
   below. The output includes a single file for each PDB file parsed,
   excluding entries that lack any chains with at least the user-specified
   minimum number (typically 5) of known amino acids or which lack any
   SEQRES or ATOM records. The data (described in Section 4.0 and Figure
   1) includes the amino acid sequence for each chain (given in the SQ
   record of a CCF file) and coordinate and derived data for each residue
   and atom (RE and AT records). Optionally the parser can be configured
   to mask (disregard) atoms in protein chains as follows: (1) Mask
   non-amino acid groups that do not contain a C-alpha atom. Masked groups
   will not appear in either the RE, AT or SQ records. (2) Mask amino
   acids that do not contain a C-alpha atom. (3) Mask amino acids with a
   single atom only. For (2) and (3) the residue will not appear in the RE
   or AT records but will be present in the SQ record.

3.0 INPUT FILE FORMAT

   An excerpt of a PDB file is shown below (Figure 1). A detailed
   explanation of the pdb file format is available on the PDB web site:
   http://www.rcsb.org/pdb/info.html#File_Formats_and_Standards

4.0 OUTPUT FILE FORMAT

   An excerpt from a protein CCF file is shown in Figure 2. The data are
   as follows (record names are given in parentheses):

  4.1 Bibliographic data

   These include the 4-character PDB identifier code or the 7-character
   domain identifier code taken from SCOP (ID), text from the COMPND (DE)
   and SOURCE (OS) records of the PDB file and experimental data (EX).
   Tokens delimiting items of experimental data are as follows. (1)
   METHOD: The text 'nmr_or_model' for structures determined by nuclear
   magnetic resonance or modelling, or 'xray' for X-ray crystallography.
   (2) RESO: The resolution of X-ray structures, or '0' otherwise. (3, 4)
   NMOD and NCHN: The number of models or polypeptide chains: for domain
   coordinate files a 1 is always given. NCHN is the number of chains that
   have at least the user-specified minimum number (5) of known amino
   acids. (5) NGRP: Number of non-covalently associated groups
   ('heterogens') that could not be assigned to a specific chain. Spacing
   lines (XX) are used for improving clarity of the file and the end of
   file (//) is clearly indicated.

  4.2 Chain-specific data

   Following the EX record the file has a section for each chain (with at
   least the user-specified minimum number (5) of known amino acids),
   containing the chain number (CN), chain-specific data (IN) and the
   chain sequence (SQ). Tokens delimiting items of chain-specific data are
   as follow. (1) ID: The PDB chain identifier or a '.' if one was not
   specified in the PDB file or if a domain is comprised of segments from
   more than one chain. (2) NR: The number of residues in the chain or
   domain. (3) NL: The number of heterogens that are associated with the
   chain. Domain coordinate files do not include coordinates for these
   groups so a value of 0 is always given. (4, 5) NH and NE: The number of
   helices and beta-strands in the chain or domain (see Section 11.2).
   Values for NH and NE are added by using PDBPLUS and a 0 will be given
   if PDBPLUS is not used.

  4.3 Residue data

   Each RE record contains data for a single residue. The data are in 26
   columns in the RE record (column numbers are given in parentheses): (1)
   RE is always given. (2 - 3) Model and chain number (always 1 for
   domains). (4) Residue number: the position of the residue in the
   sequence given in the SQ record (for protein atoms) or '.' (for
   heterogens and water). (5) Original PDB residue number. (6) SSE type
   from the PDB file: either 'C' (coil), 'H' (helix), 'E' (beta-strand) or
   'T' (turn). (7) SSE serial number from columns 8 - 10 in a HELIX, SHEET
   or TURN record of a PDB file. A '.' is given for atoms not in a helix
   or sheet. (8) SSE identifier code from columns 12 - 14 in a HELIX,
   SHEET or TURN record, or '.' for atoms not in a helix or sheet. (9) The
   class of helix, which is an integer from 1-10; 1 - right-handed alpha,
   2 - right-handed omega, 3 - right-handed pi, 4 - right-handed gamma, 5
   - right-handed 3-10, 6 - left-handed alpha, 7 - left-handed omega, 8 -
   left-handed gamma, 9 - 27 ribbon/helix or 10 polyproline; see
   http://www.rcsb.org/pdb/docs/format/pdbguide2.2/guide2.2_frame.html.
   (10) Secondary structure assignment according to STRIDE (see Section
   11.2). (11) SSE number: the position of the SSE (see Section 11.2) from
   the N-terminus. A '.' is given if the atom is not in an element. (12)
   Single character amino acid code or a '.' (for heterogens and water).
   (13) 3-character residue identifier code. (14-16) Phi and Psi angle and
   solvent accessible surface area of residue as calculated by STRIDE.
   (17-26) Accessible surface area according to NACCESS. Absolute and
   relative measures of accessibility: (17-18) for all atoms, (19-20) all
   side-chain atoms, (21-22) all main-chain atoms, (23-24) all non-polar
   side-chain atoms, (25-26) all polar side-chain atoms. Values for
   records 10-11 and 17-26 are added by using PDBPLUS and a '.' will be
   given if a value is not available.

  4.4 Atom data

   Each AT record contains data for a single atom. The data are in 14
   columns in the AT record (column numbers are given in parentheses): (1)
   AT is always given. (2 - 3) Model and chain number (always 1 for
   domains). (4) Group number of heterogens or '.'. (5) Either 'P' (a
   protein atom), 'H' (heterogen) or 'W' (water). (6) Residue number: the
   position of the residue in the sequence given in the SQ record (for
   protein atoms) or '.' (for heterogens and water). (7) Single character
   amino acid code or a '.' (for heterogens and water). (8) 3-character
   residue identifier code. (9) Atom type. (10-12) The x, y and z
   orthogonal coordinates. (13) Occupancy. (14) Temperature factor.

  Output files for usage example

  File: pdbparse.log

/homes/user/test/data/structure/1cs4.ent
SEQRESLENDIF   1 (A)
ATOMCOL12      429
BADINDEX       1 (A)
GAPPEDOK       1 (A)
SECSTART       1 1 ILE 384
SECSTART       1 1 ILE 384
//
/homes/user/test/data/structure/1ii7.ent
SEQRESLENDIF   1 (A)
ATOMCOL12      390
SECBOTH        1 1 SER 57 GLU 73
SECBOTH        1 1 VAL 78 ILE 81
SECBOTH        1 1 LYS 2 LEU 6
//
/homes/user/test/data/structure/2hhb.ent
ATOMCOL12      1277
//

  File: 1cs4.ccf

ID   1cs4
XX
DE   MOL_ID: 1; MOLECULE: TYPE V ADENYLATE CYCLASE;
XX
OS   MOL_ID: 1; ORGANISM_SCIENTIFIC: CANIS FAMILIARIS;
XX
EX   METHOD xray; RESO 2.50; NMOD 1; NCHN 1; NGRP 0;
XX
CN   [1]
XX
IN   ID A; NR 52; NL 7; NH 0; NE 0;
XX
SQ   SEQUENCE    52 AA;   5817 MW;  D8CCAE0E1FC0849A CRC64;
     ADIEGFTSLA SQCTAQELVM TLNELFARFD KLAAENHCLR IKILGDCYYC VS
XX
RE   1    1    2    396   D ASP   .    .    .    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    3    397   I ILE   .    .    .    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    4    398   E GLU   .    .    .    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    5    399   G GLY   1    1    H    1    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    6    400   F PHE   1    1    H    1    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    7    401   T THR   1    1    H    1    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    8    402   S SER   1    1    H    1    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    9    403   L LEU   1    1    H    1    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    10   404   A ALA   1    1    H    1    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    11   405   S SER   1    1    H    1    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    12   406   Q GLN   .    .    .    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    13   407   C CYS   .    .    .    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    14   408   T THR   2    2    H    1    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    15   409   A ALA   2    2    H    1    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    16   410   Q GLN   2    2    H    1    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    17   411   E GLU   2    2    H    1    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    18   412   L LEU   2    2    H    1    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    19   413   V VAL   2    2    H    1    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    20   414   M MET   2    2    H    1    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    21   415   T THR   2    2    H    1    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    22   416   L LEU   2    2    H    1    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    23   417   N ASN   2    2    H    1    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    24   418   E GLU   2    2    H    1    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    25   419   L LEU   2    2    H    1    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    26   420   F PHE   2    2    H    1    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    27   421   A ALA   2    2    H    1    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    28   422   R ARG   2    2    H    1    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    29   423   F PHE   2    2    H    1    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    30   424   D ASP   2    2    H    1    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    31   425   K LYS   2    2    H    1    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    32   426   L LEU   2    2    H    1    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    33   427   A ALA   2    2    H    1    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    34   428   A ALA   2    2    H    1    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    35   429   E GLU   2    2    H    1    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    36   430   N ASN   2    2    H    1    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00


  [Part of this file has been deleted for brevity]

AT   1    1    5    .    1002  . FOK   H C9       42.200  -11.309   50.489    1.
00   41.39
AT   1    1    5    .    1002  . FOK   H O6       42.275  -12.455   49.593    1.
00   43.23
AT   1    1    5    .    1002  . FOK   H C10      43.008  -11.601   51.811    1.
00   39.11
AT   1    1    5    .    1002  . FOK   H C11      40.680  -11.078   50.616    1.
00   44.36
AT   1    1    5    .    1002  . FOK   H O7       40.106  -10.945   51.688    1.
00   48.77
AT   1    1    5    .    1002  . FOK   H C12      39.943  -11.046   49.301    1.
00   40.67
AT   1    1    5    .    1002  . FOK   H C13      40.595  -10.085   48.292    1.
00   41.47
AT   1    1    5    .    1002  . FOK   H C14      40.276  -10.620   46.930    1.
00   46.69
AT   1    1    5    .    1002  . FOK   H C15      39.971  -11.751   46.590    1.
00   53.22
AT   1    1    5    .    1002  . FOK   H C16      40.047   -8.685   48.426    1.
00   42.42
AT   1    1    5    .    1002  . FOK   H C17      42.671   -8.737   50.253    1.
00   39.67
AT   1    1    5    .    1002  . FOK   H C18      46.732  -13.026   51.827    1.
00   35.74
AT   1    1    5    .    1002  . FOK   H C19      45.859  -11.483   53.586    1.
00   34.48
AT   1    1    5    .    1002  . FOK   H C20      42.913  -10.426   52.807    1.
00   39.44
AT   1    1    5    .    1002  . FOK   H C21      45.883   -9.553   47.821    1.
00   42.15
AT   1    1    5    .    1002  . FOK   H O5       46.157  -10.520   47.166    1.
00   40.91
AT   1    1    5    .    1002  . FOK   H C22      46.769   -8.315   48.006    1.
00   37.08
AT   1    1    6    .    1003  . MES   H O1       45.676    7.326   49.092    1.
00   77.86
AT   1    1    6    .    1003  . MES   H C2       44.367    6.816   48.900    1.
00   75.17
AT   1    1    6    .    1003  . MES   H C3       44.349    5.317   48.923    1.
00   74.42
AT   1    1    6    .    1003  . MES   H N4       44.832    4.804   50.196    1.
00   72.45
AT   1    1    6    .    1003  . MES   H C5       46.234    5.425   50.473    1.
00   73.23
AT   1    1    6    .    1003  . MES   H C6       46.176    6.914   50.355    1.
00   75.06
AT   1    1    6    .    1003  . MES   H C7       44.806    3.336   50.302    1.
00   73.39
AT   1    1    6    .    1003  . MES   H C8       44.672    2.791   51.713    1.
00   76.85
AT   1    1    6    .    1003  . MES   H S        45.724    1.379   51.967    1.
00   78.26
AT   1    1    6    .    1003  . MES   H O1S      47.062    1.828   51.737    1.
00   79.39
AT   1    1    6    .    1003  . MES   H O2S      45.303    0.380   51.016    1.
00   81.58
AT   1    1    6    .    1003  . MES   H O3S      45.523    0.961   53.326    1.
00   80.59
AT   1    1    6    .    1004  . MES   H O1       59.246   -5.152   27.381    1.
00   99.99
AT   1    1    6    .    1004  . MES   H C2       60.067   -4.021   27.127    1.
00   99.99
AT   1    1    6    .    1004  . MES   H C3       60.447   -3.301   28.378    1.
00   99.78
AT   1    1    6    .    1004  . MES   H N4       61.180   -4.156   29.270    1.
00   96.33
AT   1    1    6    .    1004  . MES   H C5       60.358   -5.461   29.506    1.
00   97.90
AT   1    1    6    .    1004  . MES   H C6       59.965   -6.072   28.203    1.
00   99.68
AT   1    1    6    .    1004  . MES   H C7       61.596   -3.484   30.507    1.
00   93.33
AT   1    1    6    .    1004  . MES   H C8       61.931   -2.010   30.442    1.
00   90.74
AT   1    1    6    .    1004  . MES   H S        60.763   -0.978   31.301    0.
50   90.72
AT   1    1    6    .    1004  . MES   H O1S      59.476   -1.170   30.680    0.
50   91.60
AT   1    1    6    .    1004  . MES   H O2S      61.249    0.383   31.164    0.
50   91.20
AT   1    1    6    .    1004  . MES   H O3S      60.776   -1.430   32.647    0.
50   90.05
AT   1    1    7    .    1005  . POP   H P1       58.812   -7.766   57.091    1.
00   57.40
AT   1    1    7    .    1005  . POP   H O1       60.254   -7.589   56.745    1.
00   54.93
AT   1    1    7    .    1005  . POP   H O2       58.618   -8.839   58.095    1.
00   55.36
AT   1    1    7    .    1005  . POP   H O3       57.949   -8.024   55.908    1.
00   55.10
AT   1    1    7    .    1005  . POP   H O        58.295   -6.370   57.759    1.
00   57.30
AT   1    1    7    .    1005  . POP   H P2       56.998   -5.955   58.661    1.
00   59.66
AT   1    1    7    .    1005  . POP   H O4       57.491   -5.746   60.070    1.
00   54.95
AT   1    1    7    .    1005  . POP   H O5       56.004   -7.075   58.550    1.
00   56.24
AT   1    1    7    .    1005  . POP   H O6       56.427   -4.710   58.044    1.
00   56.50
//

  File: 1ii7.ccf

ID   1ii7
XX
DE   MOL_ID: 1; MOLECULE: MRE11 NUCLEASE;
XX
OS   MOL_ID: 1; ORGANISM_SCIENTIFIC: PYROCOCCUS FURIOSUS;
XX
EX   METHOD xray; RESO 2.20; NMOD 1; NCHN 1; NGRP 0;
XX
CN   [1]
XX
IN   ID A; NR 65; NL 6; NH 0; NE 0;
XX
SQ   SEQUENCE    65 AA;   7395 MW;  75FBE75B22FD3678 CRC64;
     MKFAHLADIH LGYEQFHKPQ REEEFAEAFK NALEIAVQEN VDFILIAGDL FHSSRPSPGT
     LKKAI
XX
RE   1    1    8    8     D ASP   .    .    .    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    9    9     I ILE   .    .    .    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    10   10    H HIS   .    .    .    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    11   11    L LEU   .    .    .    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    12   12    G GLY   .    .    .    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    13   13    Y TYR   .    .    .    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    14   14    E GLU   1    1    H    5    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    15   15    Q GLN   1    1    H    5    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    16   16    F PHE   1    1    H    5    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    17   17    H HIS   1    1    H    5    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    18   18    K LYS   2    A    E    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    19   19    P PRO   2    A    E    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    20   20    Q GLN   2    A    E    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    21   21    R ARG   2    A    E    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    22   22    E GLU   2    A    E    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    23   23    E GLU   2    A    E    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    24   24    E GLU   2    A    E    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    25   25    F PHE   2    A    E    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    26   26    A ALA   2    A    E    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    27   27    E GLU   2    A    E    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    28   28    A ALA   2    A    E    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    29   29    F PHE   2    A    E    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    30   30    K LYS   2    A    E    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    31   31    N ASN   2    A    E    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    32   32    A ALA   2    A    E    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    33   33    L LEU   2    A    E    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    34   34    E GLU   2    A    E    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    35   35    I ILE   2    A    E    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    36   36    A ALA   2    A    E    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    37   37    V VAL   2    A    E    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    38   38    Q GLN   2    A    E    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    39   39    E GLU   2    A    E    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    40   40    N ASN   .    .    .    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    41   41    V VAL   .    .    .    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00


  [Part of this file has been deleted for brevity]

AT   1    1    .    50   50    L LEU   P CD2      12.425   39.035   22.798    1.
00   23.77
AT   1    1    1    .    402   . PO4   H P        34.178   32.996   46.387    1.
00   60.84
AT   1    1    1    .    402   . PO4   H O1       35.146   33.243   45.291    1.
00   57.95
AT   1    1    1    .    402   . PO4   H O2       34.912   32.751   47.670    1.
00   59.15
AT   1    1    1    .    402   . PO4   H O3       33.291   34.184   46.538    1.
00   58.92
AT   1    1    1    .    402   . PO4   H O4       33.352   31.796   46.060    1.
00   61.86
AT   1    1    2    .    403   . MN    H MN        8.130   27.788   21.899    1.
00   36.09
AT   1    1    2    .    404   . MN    H MN        5.801   27.935   24.271    1.
00   39.57
AT   1    1    3    .    405   . MN    H MN       36.023   34.916   44.253    1.
00   39.52
AT   1    1    3    .    406   . MN    H MN       33.658   36.365   46.296    1.
00   33.69
AT   1    1    5    .    501   . SO4   H S        17.175   28.112   32.476    1.
00  100.80
AT   1    1    5    .    501   . SO4   H O1       18.136   28.230   31.357    1.
00  100.18
AT   1    1    5    .    501   . SO4   H O2       17.097   26.692   32.887    1.
00  100.80
AT   1    1    5    .    501   . SO4   H O3       17.633   28.926   33.626    1.
00  100.14
AT   1    1    5    .    501   . SO4   H O4       15.834   28.575   32.045    1.
00  100.56
AT   1    1    5    .    502   . SO4   H S         0.566   29.512   36.007    1.
00   86.73
AT   1    1    5    .    502   . SO4   H O1        1.690   28.556   35.971    1.
00   87.27
AT   1    1    5    .    502   . SO4   H O2       -0.620   28.803   36.523    1.
00   87.87
AT   1    1    5    .    502   . SO4   H O3        0.896   30.642   36.905    1.
00   86.58
AT   1    1    5    .    502   . SO4   H O4        0.287   30.037   34.658    1.
00   86.51
AT   1    1    5    .    503   . SO4   H S       -13.586   39.644   36.031    1.
00  100.28
AT   1    1    5    .    503   . SO4   H O1      -12.340   39.512   35.250    1.
00  100.72
AT   1    1    5    .    503   . SO4   H O2      -14.638   38.811   35.421    1.
00  100.46
AT   1    1    5    .    503   . SO4   H O3      -13.347   39.201   37.420    1.
00   99.66
AT   1    1    5    .    503   . SO4   H O4      -14.020   41.056   36.015    1.
00   99.97
AT   1    1    6    .    401   . 101   H P         7.599   25.305   23.994    1.
00   56.33
AT   1    1    6    .    401   . 101   H O1P       8.249   24.467   25.030    1.
00   56.70
AT   1    1    6    .    401   . 101   H O2P       6.700   26.285   24.649    1.
00   54.49
AT   1    1    6    .    401   . 101   H O3P       8.637   26.026   23.216    1.
00   53.97
AT   1    1    6    .    401   . 101   H O5*       7.095   23.970   23.128    1.
00   59.20
AT   1    1    6    .    401   . 101   H C5*       7.073   23.961   21.762    1.
00   66.74
AT   1    1    6    .    401   . 101   H C4*       6.041   23.013   21.296    1.
00   71.22
AT   1    1    6    .    401   . 101   H O4*       6.029   21.855   22.189    1.
00   73.78
AT   1    1    6    .    401   . 101   H C3*       4.736   23.676   21.350    1.
00   73.80
AT   1    1    6    .    401   . 101   H O3*       4.355   23.874   19.995    1.
00   76.51
AT   1    1    6    .    401   . 101   H C2*       3.864   22.749   22.165    1.
00   74.04
AT   1    1    6    .    401   . 101   H C1*       4.682   21.474   22.506    1.
00   74.70
AT   1    1    6    .    401   . 101   H N9        4.578   21.123   23.969    1.
00   76.71
AT   1    1    6    .    401   . 101   H C8        3.630   21.533   24.876    1.
00   76.87
AT   1    1    6    .    401   . 101   H N7        3.758   21.069   26.081    1.
00   77.50
AT   1    1    6    .    401   . 101   H C5        4.896   20.300   25.989    1.
00   77.78
AT   1    1    6    .    401   . 101   H C6        5.570   19.479   26.941    1.
00   78.16
AT   1    1    6    .    401   . 101   H N6        5.155   19.409   28.200    1.
00   78.77
AT   1    1    6    .    401   . 101   H N1        6.682   18.805   26.554    1.
00   78.32
AT   1    1    6    .    401   . 101   H C2        7.090   18.888   25.277    1.
00   78.14
AT   1    1    6    .    401   . 101   H N3        6.541   19.611   24.271    1.
00   78.05
AT   1    1    6    .    401   . 101   H C4        5.403   20.288   24.700    1.
00   78.10
AT   1    .    .    .    407   . HOH   W O         5.997   27.242   22.189    1.
00   38.84
AT   1    .    .    .    408   . HOH   W O        35.697   35.756   46.350    1.
00   41.39
AT   1    .    .    .    600   . HOH   W O        20.825   31.690   27.031    1.
00   20.90
//

  File: 2hhb.ccf

ID   2hhb
XX
DE   HEMOGLOBIN (DEOXY)
XX
OS   HUMAN (HOMO SAPIENS)
XX
EX   METHOD xray; RESO 1.74; NMOD 1; NCHN 4; NGRP 0;
XX
CN   [1]
XX
IN   ID A; NR 141; NL 1; NH 0; NE 0;
XX
SQ   SEQUENCE   141 AA;  15126 MW;  34D13618E62A33C1 CRC64;
     VLSPADKTNV KAAWGKVGAH AGEYGAEALE RMFLSFPTTK TYFPHFDLSH GSAQVKGHGK
     KVADALTNAV AHVDDMPNAL SALSDLHAHK LRVDPVNFKL LSHCLLVTLA AHLPAEFTPA
     VHASLDKFLA SVSTVLTSKY R
XX
CN   [2]
XX
IN   ID B; NR 146; NL 1; NH 0; NE 0;
XX
SQ   SEQUENCE   146 AA;  15867 MW;  EACBC707CFD466A1 CRC64;
     VHLTPEEKSA VTALWGKVNV DEVGGEALGR LLVVYPWTQR FFESFGDLST PDAVMGNPKV
     KAHGKKVLGA FSDGLAHLDN LKGTFATLSE LHCDKLHVDP ENFRLLGNVL VCVLAHHFGK
     EFTPPVQAAY QKVVAGVANA LAHKYH
XX
CN   [3]
XX
IN   ID C; NR 141; NL 1; NH 0; NE 0;
XX
SQ   SEQUENCE   141 AA;  15126 MW;  34D13618E62A33C1 CRC64;
     VLSPADKTNV KAAWGKVGAH AGEYGAEALE RMFLSFPTTK TYFPHFDLSH GSAQVKGHGK
     KVADALTNAV AHVDDMPNAL SALSDLHAHK LRVDPVNFKL LSHCLLVTLA AHLPAEFTPA
     VHASLDKFLA SVSTVLTSKY R
XX
CN   [4]
XX
IN   ID D; NR 146; NL 2; NH 0; NE 0;
XX
SQ   SEQUENCE   146 AA;  15867 MW;  EACBC707CFD466A1 CRC64;
     VHLTPEEKSA VTALWGKVNV DEVGGEALGR LLVVYPWTQR FFESFGDLST PDAVMGNPKV
     KAHGKKVLGA FSDGLAHLDN LKGTFATLSE LHCDKLHVDP ENFRLLGNVL VCVLAHHFGK
     EFTPPVQAAY QKVVAGVANA LAHKYH
XX
RE   1    1    1    1     V VAL   .    .    .    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    2    2     L LEU   .    .    .    .    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    3    3     S SER   1    AA   H    1    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    4    4     P PRO   1    AA   H    1    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    5    5     A ALA   1    AA   H    1    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00
RE   1    1    6    6     D ASP   1    AA   H    1    .    .        0.00    0.00
    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
    0.00


  [Part of this file has been deleted for brevity]

AT   1    .    .    .    174   . HOH   W O        -4.764   -6.228    5.515    8.
00   40.89
AT   1    .    .    .    175   . HOH   W O        23.809   19.925    1.758    8.
00   39.37
AT   1    .    .    .    176   . HOH   W O        -7.871   -9.078    2.406    8.
00   43.37
AT   1    .    .    .    177   . HOH   W O         4.693   12.083    7.558    8.
00   40.24
AT   1    .    .    .    178   . HOH   W O         8.775  -23.438   16.055    8.
00   42.33
AT   1    .    .    .    179   . HOH   W O        -7.480  -10.898   17.998    8.
00   38.06
AT   1    .    .    .    180   . HOH   W O        -4.731   16.453    2.295    8.
00   36.37
AT   1    .    .    .    181   . HOH   W O        -1.055   11.866   -0.448    8.
00   43.19
AT   1    .    .    .    182   . HOH   W O       -27.610  -10.991    5.353    8.
00   43.46
AT   1    .    .    .    183   . HOH   W O        26.015   11.766    5.159    8.
00   40.95
AT   1    .    .    .    184   . HOH   W O       -18.517   -8.355   15.267    8.
00   35.55
AT   1    .    .    .    185   . HOH   W O       -14.034    2.806  -30.367    8.
00   41.77
AT   1    .    .    .    186   . HOH   W O       -32.905   -9.033    0.480    8.
00   43.68
AT   1    .    .    .    187   . HOH   W O       -28.749  -13.315    1.938    8.
00   45.36
AT   1    .    .    .    188   . HOH   W O         0.516   -8.074  -26.354    8.
00   41.53
AT   1    .    .    .    189   . HOH   W O       -20.080   -9.873  -22.862    8.
00   36.25
AT   1    .    .    .    190   . HOH   W O       -13.442    9.778  -13.572    8.
00   39.70
AT   1    .    .    .    191   . HOH   W O       -24.804   -2.608  -15.488    8.
00   37.79
AT   1    .    .    .    192   . HOH   W O         6.547    9.706   16.296    8.
00   41.86
AT   1    .    .    .    193   . HOH   W O         0.029   22.606   14.164    8.
00   43.02
AT   1    .    .    .    194   . HOH   W O       -11.367    0.306   28.463    8.
00   44.30
AT   1    .    .    .    195   . HOH   W O       -19.950  -10.635   14.301    8.
00   40.17
AT   1    .    .    .    196   . HOH   W O        -7.047   -6.324   20.098    8.
00   36.98
AT   1    .    .    .    197   . HOH   W O       -23.876    1.108   14.102    8.
00   33.31
AT   1    .    .    .    198   . HOH   W O       -34.199    8.033   11.037    8.
00   40.72
AT   1    .    .    .    199   . HOH   W O       -14.173   13.393   -8.778    8.
00   43.21
AT   1    .    .    .    200   . HOH   W O        11.388  -11.044   24.763    8.
00   39.34
AT   1    .    .    .    201   . HOH   W O         3.735   -3.643    2.734    8.
00   42.17
AT   1    .    .    .    202   . HOH   W O         3.149   -0.692    2.083    8.
00   41.40
AT   1    .    .    .    203   . HOH   W O         4.511  -25.886   13.006    8.
00   39.83
AT   1    .    .    .    204   . HOH   W O         8.712  -21.655    3.577    8.
00   43.08
AT   1    .    .    .    205   . HOH   W O        22.926   -4.304   24.079    8.
00   38.10
AT   1    .    .    .    206   . HOH   W O        11.435    9.654   20.618    8.
00   40.23
AT   1    .    .    .    207   . HOH   W O        18.099    5.542   27.744    8.
00   39.03
AT   1    .    .    .    208   . HOH   W O        12.174    9.951    9.804    8.
00   44.34
AT   1    .    .    .    209   . HOH   W O        24.745   -2.501   15.270    8.
00   39.78
AT   1    .    .    .    210   . HOH   W O        24.231    0.100   14.764    8.
00   42.94
AT   1    .    .    .    211   . HOH   W O        23.324  -18.136   10.981    8.
00   53.60
AT   1    .    .    .    212   . HOH   W O        25.576  -22.211    6.309    8.
00   45.18
AT   1    .    .    .    213   . HOH   W O        14.639   24.823   -4.300    8.
00   41.35
AT   1    .    .    .    214   . HOH   W O        14.903    5.393  -23.047    8.
00   37.45
AT   1    .    .    .    215   . HOH   W O        16.650   -5.137  -16.717    8.
00   39.12
AT   1    .    .    .    216   . HOH   W O         7.424   -6.700  -20.085    8.
00   38.62
AT   1    .    .    .    217   . HOH   W O        -1.263   -2.837  -21.251    8.
00   45.10
AT   1    .    .    .    218   . HOH   W O        23.120   -3.118  -12.992    8.
00   37.05
AT   1    .    .    .    219   . HOH   W O        23.664    0.968  -14.389    8.
00   36.25
AT   1    .    .    .    220   . HOH   W O        25.698    7.981  -15.362    8.
00   35.85
AT   1    .    .    .    221   . HOH   W O        30.009   16.347   -6.794    8.
00   37.62
AT   1    .    .    .    222   . HOH   W O        27.728   16.677   -1.376    8.
00   42.54
AT   1    .    .    .    223   . HOH   W O         8.142   18.836    1.041    8.
00   39.90
//

5.0 DATA FILES

   PDBPARSE does not use a data file.

6.0 USAGE

Parse PDB files and writes protein CCF files.
Version: EMBOSS:6.6.0.0

   Standard (Mandatory) qualifiers:
  [-pdbpath]           dirlist    [./] This option specifies the location of
                                  PDB files (input). A PDB file contains
                                  protein coordinate and other data. A
                                  detailed explanation of the PDB file format
                                  is available on the PDB web site
                                  http://www.rcsb.org/pdb/info.html.
   -camasknon          boolean    [N] This option specifies whether to to mask
                                  non-amino acid groups in protein chains
                                  that do not contain a C-alpha atom. If
                                  masked, the group will not appear in either
                                  the CO or SQ records of the clean coordinate
                                  file.
   -camaskamino        boolean    [N] This option specifies whether to mask
                                  amino acids in protein chains that do not
                                  contain a C-alpha atom. If masked, the amino
                                  acid will not appear in the CO record but
                                  will still be present in the SQ record of
                                  the clean coordinate file.
   -atommask           boolean    [N] This option specifies whether to mask
                                  amino acid residues in protein chains with a
                                  single atom only. If masked, the amino acid
                                  will appear not appear in the CO record but
                                  will still be present in the SQ record of
                                  the clean coordinate file.
  [-ccfoutdir]         outdir     [./] This option specifies the location of
                                  CCF files (clean coordinate files) (output).
                                  A 'protein clean cordinate file' contains
                                  protein coordinate and other data for a
                                  single PDB file. The files, generated by
                                  using PDBPARSE, are in CCF format
                                  (EMBL-like) and contain 'cleaned-up' data
                                  that is self-consistent and error-corrected.
                                  Records for residue solvent accessibility
                                  and secondary structure are added to the
                                  file by using PDBPLUS.
   -logfile            outfile    [pdbparse.log] This option specifies tame of
                                  the log file for the build. The log file
                                  may contain messages about inconsistencies
                                  or errors in the PDB files that were parsed.

   Additional (Optional) qualifiers:
   -[no]ccfnaming      boolean    [Y] This option specifies whether to use
                                  pdbid code to name the output files. If set,
                                  the PDB identifier code (from the PDB file)
                                  is used to name the file. Otherwise, the
                                  output files have the same names as the
                                  input files.
   -chnsiz             integer    [5] Minimum number of amino acid residues in
                                  a chain for it to be parsed. (Any integer
                                  value)
   -maxmis             integer    [3] Maximum number of permissible mismatches
                                  between the ATOM and SEQRES sequences. (Any
                                  integer value)
   -maxtrim            integer    [10] Max. no. residues to trim when checking
                                  for missing C-terminal SEQRES sequences.
                                  (Any integer value)

   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers:

   "-pdbpath" associated qualifiers
   -extension1         string     Default file extension

   "-ccfoutdir" associated qualifiers
   -extension2         string     Default file extension

   "-logfile" associated qualifiers
   -odirectory         string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write first file to standard output
   -filter             boolean    Read first file from standard input, write
                                  first file to standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options and exit. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages
   -version            boolean    Report version number and exit


  6.1 COMMAND LINE ARGUMENTS

