About this project
The aim of this project is to connect the already existing databases and to create a unified repository following an "adaptor approach". The Rett database network will allow the collection of standardized and easily comparable clinical and genetic data of a huge number of Rett patients. The data will be accessible to the participants and to the scientific community according to rules that assure transparency and equity (see access rules). In addition, this application will allow data storage for users who do not have a local computerized data management system. This international effort will be of great value in order to perform genotype-phenotype correlations, to study modifier genes, and to select subgroups of patients for clinical trials.
Phase 1. To connect the already existing databases and to create a unified repository in Europe.
Funded by E-rare (EuroRett grant Coordinator L Villard WP#1 2008-2010)
So far, four national databases on Rett syndrome have been set up in Europe: "SYRENE" in France, the "Italian Rett database and biobank" in Italy (1), the "Barcellona Rett database" in Spain and "BIRSS" (British Isles Rett Syndrome Survey, funded by RSAUK) in the UK. The "adaptor approach" will allow to preserve original data and to integrate them in a comprehensive, permissive and flexible unified structure. An information interchange system (IIS) will operate on local national databases to perform periodic data update. This system will use a standardized information schema to send data from pre-existing databases to the novel central one, so that the new archive will represent a unified repository.
Preliminary results to phase 1. From each existing database we have already received either the full access (by Nadia Bahi-Buisson N on April 9th) or a pilot cohort of patients (by Angus Clarke on May 14th , by Merce Pineda on June 4th, and by Bruria Ben-Zeev on July 15th). In order to harmonize data, hundreds of items have been analyzed and a common database schema has been defined. The novel database has been structured. Following a comprehensive criterium, it contains 300 clinical items grouped in 27 clinical domains and 16 genetic items The importing data procedures for the Italian, French, Spanish and British databases has been developed. Currently, we are working on the Israeli data. Data importation has been executed for all Italian and French Data in the source database and for the Spanish and British pilot cohorts. Currently the novel database contains data of 562 patients. 310 Italian patients, 232 French patients, the 10 Spanish patients and the 10 British patients. One server has been prepared to host the novel archive, a preliminary layout of the web site has been designed that hosts the application and some pages of the application have been developed. The application is published in the following address:
Phase 2. To extend the connection to other existing databases worldwide and to give the access to other local and national contributors without pre-existing databases.
Funded by RettSearch (Microgrant 2010 to A Renieri)
The access to the network will be open to other pre-existing databases worldwide that want to join. In addition the new archive will represent a unified repository, in which additional national or local cohorts of patients may be inserted. For those who do not have a pre-existing database there will be two options: i) insert directly in the main archive (geographical and institution provenience will be displayed); ii) construct a local or national archive connected with the main one. Several requests to join the network have been advanced by Germany, Sweden, Finland, Croatia, Serbia, Portugal, Poland, Denmark, Hungary and USA. The archives will be permissive so that patients with only partial filled items will be allowed to be inserted.
Phase 3. To develop a data mining system, which can manipulate large scientific databases.
Our overall approach will be to identify basic data mining operations that cut across applications and develop scalable algorithms for their execution. We want our algorithms: i) to discover patterns in large databases and have a completeness property that guarantees that all patterns of certain types have been discovered; ii) to have high performance and near-linear scaling properties; iii) to be particularly tailored to "mine" genetic data, so that to extract possible unknown genotype-phenotype correlations, to study modifier genes, and to select subgroups of patients for clinical trials, etc. With particular reference to the last point, machine learning techniques, both in the supervised and unsupervised framework, will be employed in order to extract significant relationships, eventually allowing auto-organization among data. Moreover, although successful in many applications, data mining poses special concerns for private data. Therefore, an integrated architecture must be devised that takes a systemic view of the problem, implementing established protocols for data collection, inference control, and information sharing (based on the European Union Privacy Directive mandate on privacy protection for data management and analysis systems).
Phase 4. To implement and maintain the database.
Depending on the specific use of the database (e.g. to perform genotype-phenotype correlations or to study modifier genes, or to select subgroups of patients for clinical trials etc.) and following new scientific evidences it will be necessary to structure, normalize and implement the novel database. To achieve this aim it will be necessary to periodically revise the structure of the database, to revise the published schema and to update the web interface.
Annual maintenance. An appropriate informatic structure that guarantees maintenance and qualified assistance available 24 hours a day and 7 days a week is necessary. This structure has to be robust enough to guarantee identification, intervention and remedy of any problem before it may cause deterioration or inefficiency of the service. Electricity and connectivity supplies, vital hardware components and technical support should be redundant. Physical security with surveillance system and backup policies should guarantee data protection, in accordance with ISO compliance for health information privacy protection, to avoid unauthorised access and data loss.
1. Sampieri K, Meloni I, Scala E, Ariani F, Caselli R, Pescucci C, Longo I, Artuso R, Bruttini M, Mencarelli MA, Speciale C, Causarano V, Hayek G, Zappella M, Renieri A, Mari F. Italian Rett database and biobank. Hum Mutat. 2007