Expertise: Data Science/ML/Predictive modeling

Softarex’s engineers can transform ideas into AI cloud solutions

Today everyone heard the words Artificial Intelligence, Data Science, Machine Learning, Neural Networks, Predictive modeling. Where can this be used? How? What does this mean in all these worlds? So let’s try to answer these questions.

Data Science
Data Science/ML/
Predictive Modeling

Data Science is one of the newest sciences which formed in the last 20-30 years. Data Science based on Applied mathematics and information technologies. This is a mix of tools, methods, algorithms, and disciplines focused on understanding data for managing real-life objects or information systems for different purposes. Data Science includes Mathematical analysis, Differential Equations, Theory of Probability, Functional analysis, Computer vision, Machine Learning, Neural Networks, Predictive modeling, Optimization methods, and other algorithms, approaches, and theories.

Computer systems that use AI-based algorithms capable to solve a wide list of tasks like texts translation, Natural Language Processing — speech understanding and conversion voice to text (or vise versa), production process management, equipment monitoring and maintenance prediction, analysis of patient’s data and predicting the need for surgeries, medicines, hospitalizations, care pathway optimization, and many other applications.

Engineers at Softarex Technologies have vast knowledge and expertise in the implementation of AI, Machine Learning, Neural Networks, Predictive modeling algorithms into new or existing systems for different industries.

The development of any system with AI capabilities starts from the domain research where it will be applied.

After we define the exact path of how AI will be implemented, what data will be used for training data sets, the expected outcome, and what algorithms for data analysis will be used.

When this analysis and design stage is finished, we move to the implementation of the system with technologies and algorithms.

Creating of High level design for Cloud based AI Clinical Trials Analysis System. Step by Step Example

Task Definition

Step1. Task Definition

The most difficult part during the creation of new medication is the evaluation and review of a clinical trial by authorities for medication approval for going to market.

During this step of review, the pharma companies lose on average €1M/day.

Reducing these costs, streaming and formalizing data submission, clinical trials review and approval are big tasks for national certifications organizations. This was followed by enhanced benefit-risk guidance for clinical assessors and by the EMA’s Benefit-Risk Methodology Project, which had as its main objective to develop and test tools and processes for balancing multiple benefits and risks.

Four tools have been proposed within the scope of this project: a generic decision-making approach entitled the PrOACT-URL Framework, an Effects Table summary of benefit and risk outcomes, MultiCriteria Decision Analysis (MCDA) modeling, and graphical displays.

The European Union Clinical Trials Register allows searching for protocol and results in information on:

  • interventional clinical trials that are conducted in the European Union (EU) and the European Economic Area (EEA);
  • clinical trials conducted outside the EU / EEA that are linked to European pediatric-medicine development.

The EU Clinical Trials Register currently displays 35002 clinical trials with a EudraCT protocol, of which 5706 are clinical trials conducted with subjects less than 18 years old.

The register also displays information on 18700 older pediatric trials (in the scope of Article 45 of the Paediatric Regulation (EC) No 1901/2006).

  1. Clinical trials data:
    • 10.000 pages
    • 100 of them are text
    • 9900 are tables with numbers
    • takes 1-6 month to compile
    • 1-2 medical writers, review: 10-20 people, + a number of data scientists to collect the numbers.
    • Regulators usually need 30 days to review the reports
Identify Workflow

Step 2. Identify Workflow of the Clinical Trials approval procedure

Clinical Trials Review and Approval Process. BfArM (German competent authority) submission is, in general, a two-step approach.

  1. After having received the application document the CA performs a check if the documents meet formal requirements.
  2. Inspection of the content is performed within 4-8 weeks.

The scheme below shows a general process for trial review and approval.

Define functionality

Step 3. Define functionality for Cloud AI based Clinical Trials Analysis system

Based on the domain’s analysis and task definition we decided that this project must be designed in the application of appropriate AI-based clinical trials analysis functions. During development of this application it should be developed particularly following steps (only few steps provided):

  1. Build a vector description of the Clinical Trial: parse the text, highlight numeric parameters, highlight categorical variables, encode text descriptions with the help of distributed word/text representations (distributed word representations or word/text embedding, Word2Vec / Sent2Vec, DSSM (Deep Semantic Similarity Model)), convert the collected data into a vector describing the report/descriptor.
  2. The constructed descriptor is transformed by the model of forming distributed views of the report (Clinical Trial embedding).
  3. To solve the problem of determining the differences between the new Clinical Trials and existing ones and to show how new Clinical Trials positioning between all existing clinical trials in the sense of Benefit-Risk Balance, we will use classification algorithms to determine the proximity of objects. The kNN algorithm (k-nearest neighbors), is generally suitable for this purpose. In Clinical Trial embedding, each vector is associated with a vector of real numbers, an element of the Euclidean space Rd for some d (usually several hundred). These vectors then serve as inputs to subsequent models, and the basic assumption is that the geometric relations in the Rd space will correspond to the relations between the reports.
Design system

Step 4: Design system workflow

At this stage, it needs to create a workflow for the AI system. Typically this workflow starts from uploading data that needs to be tested/checked and then using trained neural networks these data evaluated and results provided. This is typically done on the server-side of cloud solutions hosted in AWS or Azure cloud. Here we’ve started implementing a designed system using Python or Java in AWS or Azure clouds. Firstly we developed a system prototype for proving that designed models and algorithms are selected properly. After this we grew the system step by step for adding necessary features and improving models, training new models, expanding data sets.

Algorithms and Technologies

Softarex uses existing algorithms in research and also creates its own when it comes to solving some unique challenges. Also, our data scientists work with Amazon Machine Learning — a service with machine learning techniques simplification for developers of all levels.

Approaches we are using:
  • Learning Decision Trees and Learning Association Rules;
  • Artificial Neural Networks and Deep Learning;
  • Support Vector Machine and Ensembles;
  • Clustering, Bayesian Networks, and more.
Algorithms in our use:
  • Decision Tree, Naive Bayes Classifier, Linear Regression;
  • Logistic Regression, Support Vector Machine, Random Forests;
  • Gradient Boosted Trees, K-means, C-means;
  • DBSCAN, EM-algorithm, Kohonen Neural Network;
  • Value Decomposition, Independent Component Analysis, and others.

Case Study

1 / 11