The Early Days of CCP4, c.1977

Dr. Talapady N. Bhat

In 1977 I joined David Blow at Imperial college, London. My job there was to work on the structure determination of Tyrosyl t-RNA synthetase. This was the time when David and Alan Wonacott moved from Cambridge, MRC to London and they were yet to establish a computing facility at Imperial College for protein crystallography. Their initial proposal was to explore the possibility of my using the computers located at Cambridge for my work. A few days after I arrived at Imperial College, David introduced me at MRC to Max Perutz, Bob Diamond, Gerald Bricogne who promised to provide me their support and help to get me started. Max offered me a shared table space in a room full of 3-D ball and stick models of proteins and a computer graphics screen that could be used for real space modeling of proteins using Bob's graphics program. I was very excited to be working in the world's most famous protein crystallography laboratory with an opportunity to learn from world famous scientists. There after for a few weeks, every day I traveled from London to Cambridge for my computing needs. Some weeks later David realized that, going to Cambridge from London every day was not practical to get work done and also that it was expensive. Therefore, he suggested that I needed to look into alternative ways to get computing work done. This was the time when remote dial up technology was at its infancy in the UK, and he suggested that I may use a dial up facility from London to reach the Cambridge computer. A week later, the Imperial college telephone authorities realized that we were using the telephone lines for data transfer and they requested us not to use it any more. Following that David suggested that I may use the dial-up facility from University College London to connect to Cambridge. This mode of using the computer at Cambridge also turned out to be not practical since the dial up facility was very un-reliable and the dial up card reader frequently dropped the line during reading of cards. Furthermore, the very limited disk space (50 to 100 block of 512 bytes) allocated for my use at Cambridge did not make my job any easier.

Around the same time it so happened that, the Daresbury Laboratory was looking for opportunities to support Biological research. With this goal in mind, Dr. Sherman from DL one day visited our group at Imperial to discuss our computing needs and to explore the possibility of providing computing support for us. During his visit I explained to him our frustration in using the Cambridge computer for my computing. He described the excellent computing resource available at Daresbury at that time and then proposed to provide practically unlimited resources (both disc and computing time). He also suggested that we could use the remote terminals located at the high energy physics lab in Imperial college to log in to Daresbury. All these proposals looked too good for us to ignore particularly considering the difficulties we had in using the computers at Cambridge. However, we realized that until that time Daresbury lab did not have any of the protein crystallographic computer programs installed for our use. Though, this is a major issue, we accepted the proposal from DL to support our computing needs. David, thought that this would be a good opportunity to access SERC funds as well for our work and suggested that SERC-DL provide funds for my salary and related expenses such as travel to DL whenever needed. Dr. Sherman accepted our request with the condition that he would pay my salary only if I establish a state-of-the-art macro-molecular software suit at DL for general use and also help DL to attract users from other major laboratories such as Birkbeck College, Oxford, MRC, Sheffield and York. We agreed to work towards this goal and talked to Tom Blundel at Birkbeck College about the above proposal from DL. Tom showed interest and support for the idea, though he acknowledged that his computing needs are far less urgent than ours. By that time Tom had a well established computing facility at Birkbeck College. Daresbury identified this support for us by establishing a new funded project called - CCP4.

Following this project award to us by DL-SERC, I started installing protein crystallographic programs at DL. Some of the initial programs installed by me at DL were: a) FFT developed by Lynn Ten Eyck; b) Phase combination program with Gerald Bricogne's modifications; c) the density modification program and program to refine partially fitted structures written by me; and d) the refinement program, PROLS, by Wayne Hendrickson.

Subsequently, we organized a meeting at Birkbeck College to discuss the possibility of fostering greater participation by other protein crystallographic laboratories in the use of the DL computing facility. Leading computer program developers from several labs came to this meeting. Ian Tickle from Birkbeck College, Phil Evans from Oxford, Eleanor Dodson from York, Phil Bourne from Sheffield, Karle Branden from Sweden, Johann Deisenhofer and W. Steigman from Munich, Alan Wonacott and myself from Imperial College were some of the people who attended this meeting. The Munich group suggested that the best way to establish a complete protein crystallographic program package at DL would be to adopt their program package called ``Protein''. They said that they have a 1600 BPI tape ready with them with all the programs and they could give it to us right there. In response to that suggestion, Phil Evans replied that the scaling and phasing program from Oxford is superb and it also got to be part of the package at DL. Eleanor said that the FFT based refinement by A.C. Agarwal has to be a part of the program package at DL. Alan Wonacott replied that Gerald Brocogne's electron density map skewing and phase combination program is a must for the DL program package and thus in few minutes the list of ``the must have programs at DL'' started piling up. However, these only operated with their individual file formats and no one knew of a method that would allow to exchange data among the programs. These discussions lead everyone to realize that they had a real serious problem in integrating these must have programs. Then several people started telling horror stories how they mistakenly computed electron density maps with diffraction intensities and refined heavy atoms with wrong data and so on. To solve these problems of data exchange between programs, we considered several models. For instance, the 9A2 by Cambridge and the pre-defined file formats with reserved data columns for each type of values used by Protein package are some of the possibilities that we considered. However, everyone agreed that none of the formats such as 9A2 or the Munich file format system had the features that we were looking for. Subsequently, meeting adjourned with plans to meet again in a month's time.

Following that meeting, Alan Wonacott and I started to work on the design of a [new] file format that would meet our needs. Alan felt that the rapid sortability of 9A2 format is am essential feature for the new format. I felt that the names of columns of the files need be amenable to machine reasoning such that a user should never have to worry about calculating electron density maps using intensity values or mistaking phases with amplitudes and so on. I also felt that it is also important that a general application program developer need not have to worry about doing input or output to the data files. These requirements led us to develop the Labeled Column Format (LCF). I would consider some of the concepts used by LCF are the early attempts to provide transparent data management support for program developers [similar to] some of the features currently available in SQL based modern databases. LCF routines practically mask out data from un-wanted columns from a user program, and they also make the order in which the data columns are stored in the file irrelevant to their user. The LCF data is also sortable and editable upon request. Columns can be added or deleted as needed in a LCF file

In a subsequent meeting at Birkbeck College, Alan and I presented these LCF concepts to all members of the team and discussed its features such as what are the minimal columns, what should be their internal storage method, size (bytes) per column, direct access file or sequential access file, what should be the basis on HKL (256 or not) and so on. The idea was quickly picked up by others who [were] at the meeting. I was a strong proponent of providing common blocks as the means of sharing data between a user program and LCF APIs, But Phil Evans was a strong proponent of using passed parameters while calling the API to share data. We argued over these differences almost for a full morning and them someone, probably Eleanor suggested: since I am going to [be] developing the LCF routines I may use common blocks to share data among the LCF routines, however, since Phil and other will be developing the macro-molecular crystallographic programs to use the LCF routines that I plan to develop, I need to make provisions in LCF routines to exchange data with external programs by passing data through parameters. That is why the initial documentations on LCF routines explain sharing of data both through common blocks and through passed parameters. Since all the programs are expected to be in FORTRAN, it was suggested that all the LCF routines be written in FORTRAN. However, FORTRAN-4 did not allow the use of two character variables for storing data, and therefore an exception was granted to me to to use a PLI routine for i/o to the LCF files. The use of direct access for storing data was ruled out since we had only limited disk space available at that time and thus the LCF routines were expected to work from magnetic tape as well.

Soon after that, Daresbury expanded their support for protein crystallography and provided funds for an additional staff [member], John Campbell, and a co-ordinator, Pella Machin, for CCP4 both to be stationed at DL. With this additional support, the CCP4 project became fully mature by taking up additional roles such as providing user support, arranging workshops and so on.

Rob Allan 2022-11-17