About PhD in Data Science 2023
Program Description
The field of data science is emerging as a critical discipline with high relevance to economic growth and development. This doctoral training program established by AIMS will provide emerging African scientists the opportunity to conduct research at the forefront of data science, and work towards a PhD degree within a high-quality training program in Africa, in cooperation with institutions internationally.
The program will focus on theoretical foundations of data science as well as applications of data science to improve the daily lives of Africans. It is built on the understanding that modern approaches in data science require a combination of expertise spanning the areas of mathematics, statistics, computer science, and the applied sciences.
AIMS will be offering up to seven fully-funded PhD positions in this prestigious new doctoral program. The recruited students will be based in Rwanda at AIMS Rwanda, or any of the other AIMS centers, in partnership with universities and research institutions across Africa and globally. The program aims to train future change-makers, who will have an impact across academia, industry, education, and government.
Candidates can choose from a list of proposed research topics, and AIMS will assist in building a supervision team around these topics. Alternatively, candidates can suggest their own research topics, together with a proposed supervision team. Selected students will start in October 2023.
Eligibility criteria
- Master’s degree (completed by Sept 2023) in mathematics, statistics, computer science, engineering, physics or other relevant fields;
- Sufficient theoretical foundations evidenced by prior work (courses/thesis/other training);
- Qualification for pursuing research on the chosen topic, including relevant programming expertise;
- Research potential evidenced by academic performance and involvement in relevant academic activities;
- Motivation for pursuing a PhD by research in the suggested topic;
- Being an African national.
Summary
- Length of program: 4 years
- Fully funded (stipend, equipment, health insurance, relocation costs, conference attendance, direct cost to graduating institution such as tuition fees and registration fees)
- International supervision teams from well-known research institutions
- Research topics that push the boundaries of data science (focused on AI/ML and/or health)
- Program start: Oct 2023
PROGRAM INFORMATION
The new Doctoral Training Program in Data Science (DTP-DS) is established by Quantum Leap Africa (QLA) at the African Institute for Mathematical Sciences (AIMS) Rwanda in collaboration with top researchers from across the globe. Here, you can find more information on the following:
We have several Ph.D. positions available for topics in mathematics, statistics, computer science, and the applied sciences that broadly fit under the umbrella of data science. These QLA positions are funded by the African Institute for Mathematical Sciences.
Enrollment & Graduation
Candidates that are accepted into the program will be enrolled in two institutions:
- One of the five AIMS Centers of Excellence (Rwanda, Ghana, Cameroon, Senegal, South Africa)
- A higher education institution (generally in Africa) partnering with AIMS
Candidates will need to be ordinarily resident in an African country, and satisfy the degree requirements for a PhD in Research of their graduating institution, as well as the program requirements of the AIMS DTP-DS. The PhD degree will be conferred by the partnering institution upon successful completion; QLA is facilitating the creation of international co-supervision partnerships, is providing funding through global partnerships, and offers additional training in research skills and transferable skills to the PhD candidates.
Supervision
Candidates are mentored by a supervision team of 2-4 supervisors, forming a partnership between AIMS and higher education institutions in Africa and internationally. Each supervision team should consist of at least one supervisor affiliated with AIMS, and one supervisor affiliated with the graduating institution. These rules are flexible and details can be discussed and adjusted on a case-by-case basis.
The supervision team will be formed during Phase 2 of the application process in communication with shortlisted candidates, the DTP-DS management board, and potential supervisors. Candidates have the possibility to suggest their own supervision team.
Research topics
Applicants can select from a list of research topics suggested by leading researchers in their field. Alternatively, applicants are welcome to suggest their own research topics. Shortlisted candidates will be put in touch with the supervision teams that proposed their selected topics for discussions on more concrete research ideas in Phase 2 of the application process. More details on the selection of research topics, click Research topics.
Training Components
All candidates are invited to participate in an intensive training school in the first year of the program, organized by QLA. Here, candidates will acquire skills relevant to their research and broaden their subject knowledge in data science through a small number of intensive core courses taught by top international researchers.
The program plans to provide continuous training opportunities virtually and/or in person. Additional training components may include (but are not limited to):
- Guided seminars and reading groups
- Participation in transferable skills courses (academic writing, presentations skills, research methodology course)
- Group projects / mini dissertations
- Designing and delivering a mini-course (senior PhD students)
- 3 Minute Thesis Competition (senior PhD students)
- Tutoring in AIMS structural masters program (senior PhD students)
PhD candidates are encouraged to pursue internships in industry or external institutions towards the end of their PhD in a field related to their research topic, depending on sufficient progress on their dissertation.
CALL FOR RESEARCH TOPICS SUBMISSION 2023
Program Overview
Quantum Leap Africa (QLA), owned by the African Institute for Mathematical Sciences (AIMS), invites submission of research topics for the PhD program. We encourage, but are not limited to, topics in Artificial Intelligence/Machine Learning, Data and Quantum Sciences with practical applications that address the challenges of the African continent, preferably. We welcome research topics submitted by academic researchers, and industry.
The submission of a research topic can be individual or in teams. Only one application per supervision team is required. Please provide as much information as possible about potential co-supervisors. QLA can help identify additional co-supervisors if needed.
Team Composition
The supervision team usually comprises 3-4 supervisors per student. The team should include at least one supervisor with the student’s graduating institution, and one supervisor affiliated with AIMS (Ghana, Senegal, Rwanda, Cameroon and South Africa). If no member of the supervision team is affiliated with AIMS, affiliations can be arranged in some cases. It is important to have a main supervisor taking the lead in mentorship on the research project.
Student(s) Selection
After the pre-screening stage by QLA staff, the supervision team will conduct an interview to select the suitable candidate(s) for their research topic and will support them in writing their proposal. Final selections are made based on the strength – in combination – of the student, the research plan, the supervision team and the supervision set-up.
Financial Support
Following the evaluation of the supervisor-selected candidate(s) by QLA board members and external reviewers, the successful candidate(s) for the topic(s) will be funded for a period of up to four (4) years.
Location
The successful candidate(s) must be enrolled in an African institution/University. However, under certain circumstances, candidate(s) may be enrolled in an institution outside of the African continent with which the primary supervisor is affiliated (note that the stipends received by QLA PhD candidates are dedicated for students graduating from an African institution).
Research Topic Submissions
Research topics must be submitted through the online submission form by midnight (Central African Time) on February 1st. The proposal form can be accessed HERE.
Information required to submit the proposal form:
- Topic Title
- One paragraph of topic description
- List of all supervisors, their affiliation and contact details
- Designation of lead supervisor: this member of the supervision team is taking responsibility for adequate academic input for the student and research progress on the proposed topic. The lead supervisor should be the main subject expert.
- Designation of AIMS affiliation
- Background & Skills: Please mention which subject knowledge / academic background / skills are required for the PhD student to succeed in this topic area. Also mention skills that are not required, but beneficial to succeed at this topic. The information you provide here will be used for shortlisting candidates, so please take care to provide sufficient details and be specific.
After the submission deadline, the QLA management board will select the most relevant topics within the focus of the program to be offered to PhD candidates. The successful teams will be notified.
Join Supervision Teams
If you wish to become involved with the QLA Data Science program by taking on a supervisory role, but do not have a precise research topic or a supervision team, you are welcome to register your interest HERE.
By submitting this form, you agree that the information you submit may be shared within and outside of QLA. We may also contact you directly regarding joining a supervision team or relevant research topics based on the details you submitted.
Inquiries
For any questions about the program, please contact the manager, scientific program Ms. Molly Mutesi (mmutesi@quantumleapafrica.org). Examples of past submitted research topics can be found HERE.
RESEARCH TOPICS 2022 COHORT
To become a PhD student as part of this doctoral training program, you have two options:
- Work on one of the research topics suggested by the program,
- Suggest your own research topic.
Working on a proposed topic
For option i), please consult the research topics posted below. Each research topic already comes with a supervision team. The top five applicants for each topic will be shortlisted and then put in touch with the supervision team directly. In the next step, supervision teams will select those candidates with whom they would like to move into Phase 2. In Phase 2, the supervision team together with selected shortlisted candidates prepare a detailed research proposal. This will also provide you the opportunity to discuss the precise focus of your chosen research topic with your supervision team, discuss if you and your supervision team want to add additional supervisors, and decide at which university you plan to register and graduate for the PhD.
When shortlisting candidates for the proposed topics, we will evaluate how well your academic background and skills fit the topic you are applying to. Please read the description and required background of the topic you are applying to in detail, and make sure your application matches the research direction.
Suggesting your own research topic
To suggest your own research topic, simply select ‘Other’ in the application form. A box will open where you can specify a title and a topic description as well as your supervisors. You will also have to classify in which stream your topic best fits.
When suggesting your own research topic, make sure that you have put detailed thought into the research direction you are suggesting. In the end, this is the problem you may spend 3-4 years working on full-time. Your research plans should be advanced enough that it is clear that your research can start immediately and has a high chance of success. Prepare your topic description with care; note that the application form is restricted to max 2000 characters for the topic description. Most importantly, the topic description should explain what exactly you will do as part of your PhD, including the methods used, the technical skills you need, what exactly is the novel research contribution it provides, and if data is needed, which data sets you are planning to use and how you will get access to them. Be precise and to the point, do not spend too much characters on general introduction and motivation. From your description, it should be possible for us to assess if the proposed research topic is
- novel and interesting to the international research community,
- realistically achievable in the context of a PhD,
- matching with your skill background and with the skill background of your proposed supervisors,
- equipping you with the expertise to proceed into a successful future career,
- fitting to the scope of our PhD program.
When proposing your own topic, you will also have to build your own supervision team. Make sure your supervisors are aware you are planning to work with them and agree to supervise you before mentioning their names. If you are shortlisted, we may also be able to help you find additional supervisors if needed. At the end of Phase 2, your application will be evaluated as a package, including how well your supervisors and your background are suited to advance the research you propose overall.
Streams
For the 2022 cohort, we will have two streams:
- General Data Science (G)
- Data Science for Health in Rwanda (H)
Some topics fit in both streams. Proposed topics are listed below, the streams are indicated in brackets with the title.
Topics:
- Accelerating multitask reinforcement learning with attention mechanisms(G)
- Quantum Topological Data Analysis (G)
- Physics Informed Learning Machines with application in climate science (G)
- Federated learning handling EEG data spread in Africa and Europe (G,H)
- How the human embryo develops: combining mathematical modelling and data science (G,H)
- Data-Driven Modelling for Risk Evaluation and Early Detection of vector borne-disease outbreaks in Rwanda (H)
- Mathematical model for predicting and analyzing the impact of air pollution on the cardiovascular-respiratory system in Rwanda (H)
- Identification of new cervical cancer gene signatures and predictors of clinical outcome using machine learning techniques (G,H)
Topic description and required background:
- Accelerating multitask reinforcement learning with attention mechanisms (G)
Reinforcement learning has recently been successful in behaviour learning in a variety of high-profile, complex tasks. Unfortunately, it is generally very sample inefficient, which has implications for it being widely used in real-world problems. Attention mechanisms such as transformers have recently provided significant benefits in other classes of temporal domains. We propose to leverage recent advances in attention to investigate whether a learning agent can learn to focus only on relevant features of a problem, thus greatly accelerating learning. In addition, we will explore opportunities this provides to learning invariant properties and objects in an environment: knowledge which can be exploited to solve new problem instances.
Required Background
Multivariate calculus, linear algebra, optimization, basic familiarity with concepts in machine learning, strong Python programming experience, familiarity with reinforcement learning or attention mechanisms.
- Quantum Topological Data Analysis (G)
Boundary operators are a key connectivity concept in mathematics, physics and data science. Calculations involving the operator on arbitrary data-points, generate high-dimensional interpretable summaries of the full data-set related to its local and global “shape”. However, classical computational costs are exorbitant due to the underlying combinatorics. Recent work has shown that the restricted boundary operator may be implemented on a Quantum Computer exponentially faster than classical computers in linear depth. This PhD topic proposes to explore various algorithms and applications based on the boundary operator in a search for exponentially-accelerated quantum advantage on real-world problems.
Required Background
MSc in a mathematical topic. Preferable: Quantum physics/computing, Topology, Bayesian Inference, Strong programming skills.
- Physics Informed Learning Machines with application in climate science (G)
Physics Informed Learning Machines (PILM) are increasingly used to solve problems resulting from natural or engineering processes formulated as mathematical models, e.g. partial/ordinary differential equations. Due to the recent development of low cost sensors that produce high spatial and temporal resolution data, source term estimation in differential equations has attracted more attention, especially within the air pollution modelling community. In this work, we will propose a new PILM approach for modelling the source function as a Gaussian process or neural networks, and approximate the solution of the associated differential equations with neural networks. Our approach will be applied to data from a network of low-cost sensors deployed in Rwanda.
For related references, see https://www.sciencedirect.com/science/article/pii/S0021999118307125 and https://arxiv.org/abs/2202.04589
Required Background
A Master’s degree in mathematical sciences; Strong background in numerical analysis of PDEs, Optimization, deep learning and inverse problem; Strong programming skills in Python especially Tensorflow.
- Federated learning handling EEG data spread in Africa and Europe (G,H)
Our group is focused on neuroimaging, ranging from microscopy to MRI and non-expensive solutions as EEG. The analysis of those data have been always conducted by machine learning or other data science tools (https://bam.sano.science/). In 2014, before the deep learning boom, the main supervisor conducted several projects with novel technology in rural areas of Ghana while teaching at AIMS Ghana. The most popular project is the prenatal care project called http://www.docmeup.org/. Now, we want to investigate brain disease on a large scale both in Europe and Africa, taking advantage of inexpensive EEG devices and cloud computing, with the goal of achieving a Global South and Global North federated learning approach. Federated learning (also known as collaborative learning) is a machine learning technique that trains an algorithm across multiple decentralized edge devices or servers holding local data samples, without exchanging them. The awareness and study of brain disorders has not been fully embraced in Rwanda and Uganda and Africa at large due to factors such as stigmatization associated with these disorders, lack of adequate bio-medical data of these disorders, a few research experts, among others. It is reported that there is an increasing prevalence of neurological disorders in sub-Saharan Africa. This project will allow a researcher in the areas of the application of data and network science tools to have impact in the medical domain, particularly the brain disorder sector. The ripple-effect of this will be the passing on of the knowledge and expertise to the next generation thus strengthening a network of researchers in this interdisciplinary domain. Data are by nature “big”, we expect daily streaming of GB. Moreover, they are represented as time series. Therefore, blending of machine/deep learning with Fourier, Wavelet, and various spectrum analysis is expected. Functional temporal connectivity will lead to graph representations. Therefore, also complex networks approaches can be considered.
Required Background
Strong Python skills, Good machine learning background (with practical experience particularly recurrent neural networks), knowledge of signal theory (Frequency domain analysis, independent component analysis…), passion for hands on real world circuit and data acquisition through devices as EEG
- How the human embryo develops: combining mathematical modelling and data science (G,H)
How the early human embryo develops from a single fertilised egg to a multicellular structure is still poorly understood. Studying this has traditionally been difficult due to ethical and logistical issues working with human embryos. However, in 2021, a cellular model system was developed at the University of Exeter that makes investigating embryogenesis much easier. Based on data from this model system, this project will study mathematical models of embryo development. This will involve a combination of computer simulation, model design, mathematical analysis, parameter fitting and image analysis. Excitingly, this project has potential applications to improving IVF treatment, by suggesting improved ways that the best embryos can be selected for implantation.
Required Background
This PhD requires strong mathematical and computational skills, including mathematical modelling, data analysis, simulation (in a language like MATLAB, Python or C++) and analysis. No prior knowledge of biology or image analysis is required; training in these will be provided during the project.
- Data-Driven Modelling for Risk Evaluation and Early Detection of vector borne-disease outbreaks in Rwanda (H)
Traditional public health surveillance relies heavily on statistical techniques. Recent years have seen tremendous growth of AI-enabled methods, including but not limited to deep learning–based models, complementing statistical approaches. Many disease-causing organisms are strongly influenced by environmental factors such as temperature, rainfall, and humidity, which are in turn influenced by the prevailing climate. This project will integrate multiple types of data such as environmental, epidemiological, news reports, and search data, and develop novel mathematical, statistical, and big data techniques to a) evaluate the risk that the most common vector-borne diseases such as Rift Valley Fever and Foot-and-mouth disease pose to Rwanda, b) detect and give early warnings to domestic spread for imported cases from neighboring countries; c) evaluate the risk of the case spread from these to other districts in Rwanda; d) mitigating strategies, and e) address the numerous remaining gaps in data for such systems, f) develop an AI based application for identification of mosquitos associated with vector-borne diseases.
We will use data from Meteo Rwanda Institute, Rwanda Biomedical Center (RBC), Rwanda Agriculture and Animal Resources Development Board (RAB), and data from our ongoing cohort studies. Our main partners will be AIMS, University of Rwanda and Dalhousie University.
Required Background
Statistical analysis and computing, Machine Learning, Deep Learning, processing large data sets, Data Visualization, Data Wrangling, Mathematics and Programming skills.
- Mathematical model for predicting and analysing the impact of air pollution on the cardiovascular-respiratory system in Rwanda (H)
The topic focuses mainly on developing a mathematical model for predicting and analysing the impact of air pollution on the cardiovascular-respiratory system in Rwanda. It involves different datasets: air pollution, epidemiological data and data related to non-communicable diseases (NCDs). Rwanda Environment Management (REMA) and Rwanda Biomedical Center (RBC) should provide these data so that model parameters are estimated using optimization methods. Using the developed mathematical model, different mathematical methods including statistical, deterministic or stochastic approaches can be used for analyzing the impact of air pollution on the cardiovascular-respiratory system. Based on the developed mathematical model, Bayesian approaches like Markov chain Monte Carlo algorithms and other deterministic approaches focusing on the basic reproduction number can allow to predict NCDs.
Required Background
The student should have a background in Mathematics (Applied Mathematic or Statistics)
- Identification of new cervical cancer gene signatures and predictors of clinical outcome using machine learning techniques (G,H)
Cervical cancer is a highly preventable disease, therefore, early screening represents the most effective strategy to minimize the global burden of this disease. Genomic profile analysis has been successfully applied to deconvoluting the molecular profile of various cancers and in gene biomarker discovery. Genomic profiling is also highly compatible with machine learning methods such as Support Vector Machine, Random Forest and Convolutional Neural Network. Further, gene datasets of most cancers are publicly accessible via the US National Center for Biotechnology Information Gene Expression Omnibus database (https://www.ncbi.nlm.nih.gov/geo/ ) and the European Bioinformatics Institute ArrayExpress portal (https://www.ebi.ac.uk/arrayexpress/). We hypothesize that the integrative use of novel machine learning methods on cervical cancer gene datasets could lead to new gene signatures with high prognostic accuracy. Thus, our aims are:
- To define new gene signatures of cervical cancer using published datasets accessible from the NCBI GEO and ArrayExpress, by applying machine learning approaches.
- To develop a predictive model that will suggest the possibility of cervical cancer in patients.
Required Background
Experience in biomedical science, and an aptitude in computer/data science, especially machine learning (in particular knowledge on how to use CNN for developing a model using image datasets will be useful). Applicants should also be able to navigate bioinformatics software such as R, Python or MatLab and should have strong troubleshooting abilities. A specialization in cancer is not required, but would be an advantage.