Transcript source
Anila-ButtTranscript
Anila Butt
[Image appears of a split circle appears with photos in each half of the circle flashing through of various CSIRO activities and the circle then morphs into the CSIRO logo]
[Image changes to show a slide on the screen showing a flow chart and Anila Sahar Butt can be seen inset talking in the top right and text appears on the slide: A Information model for scientific workflows, Anila Sahar Butt, 22nd March 2021]
Anila Butt: I am presenting my work on “Information Model for Scientific Workflows” and for that let’s start with science and scientific applications.
[Image changes to show a new slide showing inset boxes showing a digital map, various graphs, a cow photo, and a “Welcome to LOOC-C” webpage, and Anila can be seen inset talking and a text heading appears: Science]
As we all know that with the increasing in trust in data driven potential, modern world applications are being designed to provide real time application to Australian farmers and land managers and this information helps them in making informed decisions to increase their profit and reduce the risk.
[Image changes to show a new slide showing a Smartphone in the centre of the screen and Anila inset talking in the top right and text appears: Trust in Science, Which FullCAM version sits behind the ACCU calculator?, Are the indicative carbon yields based on a full 100 year permanence period?, Do you make any additional assumptions about fires, losses, reversal or non-eligible areas?, Have you already discounted the ACCU rate?]
However that’s a common observation about these applications and that is that users are curious to know a bit more about the information that we are providing them. For example on the carbon project my team receives queries similar to these, where the user wanted to know the versions of the model that is used to calculate ACCUs for the farmer, land, or for a particular project, and also they want to know if the calculation is done using, by making some assumptions. And similar queries arises most of the time. So, these type of queries reflects user curiosity to know a bit more about the information to trust that information.
[Image changes to show a new slide showing inset data on the right and Anila can be seen inset in the top right talking and text appears: What we’re doing?, Building trust through transparency, A solution towards Explainable Research, Data source and how data is manipulated, Who did what and when?, Challenge, Information – What to capture?, Representation – how to represent in a common format?]
So, what we are doing is that we are trying to build trust, user trust in scientific applications, and I’m working on the techniques that can be used to establish trust through transparency. When I say transparency it means that we want to provide users with some feedback about the information that we are providing them. For example, on any [1:52] seed project if we provide user the information that these are the projects that are, can be available for their farms, and the amount [2:03] that are taken on using these farms, then with additional information on the feedback that they would like to know is that how we have calculated these, these estimations, what data sources we have used or what models we have, we are using to generate these statistics. So, to design such a solution one of the challenge is that we need to know what information we should capture to provide this type of feedback to scientific applications and how we can represent this information in a format that is not dependent on a particular application, or on the research of that application.
[Image changes to show a new slide showing a workflow diagram and Anila can be seen inset in the top right talking to the camera and text appears: Approach, Information – What to capture?, A workflow & its traces, Representation – How to represent?, An information model, Common structure of a workflow & traces]
So, for that to answer the questions of what to capture, we realised that with identifiably based platforms more and more applications are being designed to scientific workforce. Scientists are using their experiments in the form of workflows. Workflows are basically the analysis by scientist of the data, and any models they gather. They are systematically better to provide solutions to scientific problems.
So, considering that applications are, applications are being designed in the form of workflows we decided to target those applications that, that will be composing their research in the form of workflows. So, we, instead of capturing the information of a particular science, all the application will be capturing the information of the structure of the workflow that they are using to compose their experiments.
The next question is that how we are going to represent this information. So, the representation is an issue because to compose these workflows there are different platforms available. For example, this workflow is composed using a Synapse. So, but there are other applications, other platforms that area available, they provide different tools to, to design these experiments and because there are different tools therefore they’re, the same workflow can have different structure in different platforms. So, what we want to do is we want to give a formal representation to these workflows and for that we propose an information model that will, that will provide a form and structure through workflows and these exhibition cases and it will not depend on the application that is at the, the scientific application.
[Image changes to show a new slide showing logos for workflow platforms DataONE, Workflow4Ever, OPMW, and a workflow diagram beneath and Anila can be seen talking inset in the top right and a text heading appears: Core requirement]
So, for that the community has already proposed some models to represent scientific workflows and their traces. These are the few most widely used models and… but these models are designed for data driven workflows. The data driven workflows, I mean that, in a workflow model will be executed as soon as data becomes available through that particular model. But in scientific workflows there are for example, this is part of a workflow where we can see that a model will be executed based on that information. So, which means that there can be workflow constructs involved in a scientific workflow. And if we try to represent these type of info… workflows using the existing models they are represented like this. So, this is clearly an under representation of the agflow workflow. So, we are, we have come up with a model that can represent scientific workflows that involved controlled flows. And for that, the first model we identified all the sections that can be part of any scientific workflow.
[Image shows red boxes appearing around three sections of the workflow on the screen and text appears above the boxes: Exclusive Choice, Simple Merge, MI with prior runtime knowledge]
Like this workflow has three sections in it.
[Image changes to show a new slide showing a model of the control flows in a workflow and Anila can be seen in the top right talking and text appears: SWCf Core Structure, This model – Represents control flows in workflows, Complements existing models, It has been published]
So, next we proposed a model. This is the proposed structure. It has some classes and relationships between those classes. So, what this model is doing, it represent control flow in scientific workflows. Then it complements existing models because we have provided a mean to integrate this model with the existing model, with the published, it complements existing models and then it has been published.
[Image changes to show a new slide showing different platform logos, a Scientific Workflow Control-flow Model, and two workflow diagrams, and Anila can be seen in the top right talking and text appears: Usage Scenario, Different, Common]
As I mentioned that there are multiple, there are different platforms that can be used to design or compose scientific experiments. So, because they have different tools, so there is I think workflows have different structures. So, these are the two workflows that have different structure. So, if we want to represent those, these workflows in a formal representation, for that we have, we have proposed a model. Through this model we can provide a formal representation to use workflows and then this representation in a knowledge repository can be used to extract information that is required to provide feedback about scientific outcomes.
[Image changes to show a new slide showing a bar graph displaying model complexity and a line graph showing model performance and Anila can be seen inset in the top right talking and text appears: Evaluation, Comparative performance against the state-of-the-art models, Baseline – ProvONE, Wfdesc, OPMW, Dataset – MyExperiment ~ 570 Workflows, Complexity & Expressiveness]
So, we evaluated our model in three different ways. Here I’m presenting the one that is the comparative performance evaluation against the state-of-the-art models. We have, we have three baseline models and I actually took the dataset from an online repository, MyExperiment. That contains many workflows and we found some workflows from that. This dataset really presented, we, to get the data presentation of these workflows according to the baseline and our model, and this, on this representation we did some complexity and expressiveness analysis. The complexity, we are measuring in terms of the size of the data and the graph shows that this scientific workflow [8:05] model that we have proposed in this work, it is actually more complex as compared to the existing approaches but it is more expressive as well. So, this performance, and definitely this expressiveness is the cost of complexity but with the past graph data bases we are handling these, this complexity
is not a difficult task.
[Image changes to show a new slide and Anila can be seen in the top right talking and a text heading and text appears: Trust and transparency is critical for scientific applications, We have built our model to establish trust in scientific applications, Evaluated the model, it’s complex but Expressive]
So, I’m summarising my work here, this talk here. So what, we say that trust and transparency is critical for scientific applications. We have built our model to establish trust in scientific applications. We evaluated our model. It works well. It is complex but it is more expressive.
[Image changes to show a new slide and Anila can be seen inset talking in the top right and text appears: Thank you, CSIRO Land & Water, Anila Butt, Postdoctoral Fellow, anila.butt@csiro.au]
So, this… I have presented part of the work. If you want to know more about this, or if you want to know how, how you can use my work to solve your problem please come and talk to me after the session. or you can contact me at anila.butt@csiro.au. Thank you.
[Image changes to show the CSIRO logo on the screen and text appears: CSIRO, Australia’s National Science Agency]