

# "A methodology to ensure safety (certification) of complex software in safety critical automotive systems"

Francesco.Brancati@ResilTech.com

# Scope of the Talk

# This talk presents an approach to Safety Analysis Dependent Failure Analysis

according to the automotive standard ISO26262 for complex software with focus on the following features

Embedded SW

RESILTECH

- Library/component based
  - Suitable for SEooC (Safety Element out of Context) integration
- Multi-criticality software systems



- **1. Short Company Introduction**
- 2. SW Safety Analysis and DFA in Automotive
- 3. ResilTech Methodology
- 4. Feedback from application and future directions



# **1. Short Company Introduction**

- 2. SW Safety Analysis and DFA in Automotive
- 3. ResilTech Methodology
- 4. Feedback from application and future directions

# ResilTech s.r.l.



### Mission

 $\geq$ 

To provide engineering consulting and design services to companies and public bodies mainly for, but not limited to, the field of resilient systems and infrastructures



**(F) RESILTECH** 







Via Colonnello Archimede Costadura 2C, 73100 - Lecce, Italy

**Branch Office 2** 

Via La Boccetta 7, 89134 – Reggio Calabria, Italy

#### **Branch Office 3**

Salerno, just opened in Dec. 2018

# Core services

#### **Resilient Systems Design**

Architecting and Implementation of Dependable Systems

### Verification & Validation & Safety

Full V&V&S Cycles activities according to latest standards of SW intensive system

### Support to Certification Bodies

Cooperation with National & International Certification Agencies

### **Cyber security**

Security solution design and assessment

### **Advanced Training**

**(F) RESILTECH** 

On Safety Standards, system modeling, Life Cycle Cost Analysis, Verification and Validation

#### 75° Meeting of IFIP Working Group 10.4 Champery, Switzerland. 24-28 January 2019 – Company Confidential







6



# Creating Innovation

### Strong Reseach Attitude:





#### **Ongoing Projects:**

**SISTER -** POR Toscana 2014 **STORM -** H2020-DRS11-2015 **PROTECT ID-** PON - MISE 2016 Net2DG- H2020-LCE-2017 YACHT4.0— POR Toscana 2017 **Good4you**– Innonetwork (Puglia)

**Starting Projects:** 

MAIA-PON-MIUR-2018 **ADVANCE**- H2020-RISE-2018

- Cost Effective V&V Methodologies and tools
- Integration of AI components in Safety Critical **System**
- Online Failure/Intrusion prediction and Detection
- Monitoring and Analysis (ML& AI) ٠
- Continuous Transparent Biometric authentication •
- Safety Platforms SW/system for Emdedded System ٠
- Intelligent and smart monitoring of SoCs
- Methods for Resilient time distribution ۲

## **(F) RESILTECH**

75° Meeting of IFIP Working Group 10.4 Champery, Switzerland. 24-28 January 2019 – Company Confidential

### **Patents:**

METHOD AND APPARATUS FOR A **RESILIENT SIGNALING OF TIME** Italian Office N. 102015000072477

# Standardization Activities in Safety



### ISO TC22/ SC32/WG8

for ISO26262 ("Road vehicles - Functional safety") for ISO21448 SOTIF ("Safety of the Intended Functionality")

**OpenGL SC 2.0** is a safety critical subset of the Open Graphics Library for safety critical markets



CONNECTING SOFTWARE TO SILICON



- streamlined APIs can significantly reduce certification costs
- includes avionics and automotive displays
- OpenGL SC 2.0 Full Specification: April 2016.
  - <u>https://www.khronos.org/registry/OpenGL/specs/sc/sc\_spec\_2.0.pdf</u>



- 1. Short Company Introduction
- 2. SW Safety Analysis and DFA in Automotive
- 3. ResilTech Methodology
- 4. Feedback from application and future directions

# Automotive market trend

| Comp | lexity d | rivers |
|------|----------|--------|
|      |          |        |

- Increasing complexity of functions
- More and more distributed development
- Rising liability risks, such as security and safety

| 1975            | 1985                      | 1995                         | 2005                                                      | 2015                         | 2025 |
|-----------------|---------------------------|------------------------------|-----------------------------------------------------------|------------------------------|------|
| Antilock brakes | Electronic fuel injection | •••                          | •••                                                       | •••                          |      |
| injection       | Antilock brakes           | CAN bus                      | Active body control                                       | AUTOSAR                      |      |
| Electronic fuel | CAN                       | Traction control             | Electronic stability control                              | Remote diagnostics           |      |
|                 | Traction control          | Gearbox control              | Hybrid powertrain                                         | Electronic braking control   |      |
|                 | Gearbox control           | FlexRay                      | AUTOSAR                                                   | Head-up display              |      |
|                 |                           | Electric power steering      | Online software updates                                   | Emergency braking assistance | )    |
|                 |                           | Emergency calling            | Remote diagnostics                                        | Automatic stop and start     |      |
|                 |                           | Active body control          | Electronic brake control                                  | Lane assistant               |      |
|                 |                           | Electronic stability control | Head-up display                                           | Adaptive cruise control      |      |
|                 |                           | Hybrid powertrain            | Emergency braking assistance                              | Electric powertrain          |      |
|                 |                           |                              | Automatic stop and start                                  | Ethernet/IP backbone         |      |
|                 |                           |                              | Lane assistant                                            | Gesture HMI                  |      |
|                 |                           |                              | Adaptive cruise control                                   | 3D displays                  |      |
|                 |                           |                              | Electric powertrain Complexity<br>Adaptive cruise control | Laser-sourced lighting       |      |
|                 |                           |                              | oplex                                                     | Fuel-cell technology         |      |
|                 |                           |                              |                                                           | 5G mobile communication      |      |
|                 |                           |                              |                                                           | Cloud computing              |      |
|                 |                           |                              |                                                           | Connectivity, Vehicle2X      |      |

- "migration" of technology (and SW) from non safety relevant application.
- increasing need of having components with some degree of built-in errordetection capabilities
  - To ease the integration and acceptance of SW non developed with full compliance to safety lifecycle (e.g. library porting from consumer application).

### **(F) RESILTECH**

75° Meeting of IFIP Working Group 10.4 Champery, Switzerland. 24-28 January 2019 – Company Confidential

Mobility services

Brake-by-wire

Steer-by-wire

Autonomous driving

# Normative requirements - intro



**Road Vehicles - Fuctional Safety** 

#### Part 6, 7.4.10 till 7.4.13

Additional info on part 9, sec 7,8 and annexC, but not sw specific

enhance the safety architecture of the SW even at component level (**SEooC concept**) when this applies.

ISO26262 supports such industrial need asking to

# Safety Analysis (SA)

Dependent Failure Analysis (DFA)

### main goals:

- 1. to **support the safety concept verification** when is based on the independence/diversity of software functions/components
- 2. to **verify the coexistence criteria** among the software components
- 3. to **support the specification of safety mechanism** at software architecture level, in order to mitigate SW failure identified in the analysis

### Main techniques: **SW-FMEA** (3) and **DFA** (1-2)

Such requirements were already present in Edition 1 (2011), but lack of experience in application push the committee to provide a **full informative annex (E) to guide industry** in the second edition (2018)

## **(F) RESILTECH**

# Normative requirements - lifecycle



Main aim within the lifecycle is:

to support the **specification of safety mechanism at software architecture level**. Output of the Activity:

- modified architecture to accommodate error detection and error recovery mechanisms (and proper reactions of the SW in line with original safety concept).
   And/or...
- Evidence that existing architecture is completely or partially fine as it is.
- Additional Assumptions of Use for system level

## **(F) RESILTECH**

# **Challenges and Opportunities**

### Challenges:

- 1. The inclusion of mechanism as deadline or control flow monitoring in SW architectures is not new in the safety industry, but this is mostly done based on experience **without a complete formal modelling of the architecture and of the SW faults.**
- 2. This inclusion is **often done when dealing with the entire system architecture** while it may be beneficial also if applied to parts of it (e.g. OS+middleware or complex libraries).
- 3. The new annex in **ISO26262** provides some guidance (example-based) but still **delegates the definition of a clear methodology** in line with the aim of an informative text.





## **Opportunities**:

- 1. Proposal of a **clear methodology** to perform such activities.
- 2. This is **important particularly for SW** as fault modelling and FMEA approaches are more understood and applied in the industry at HW and system level rather than SW.
- 3. In addition an important aspect, generally not fully considered when defining SW Safety Mechanisms, is to consider **how the effectiveness of V&V activities affect the "likelihood" of some SW faults**.
  - Here the point is trade-off architectural changes versus fault-removal techniques.

### **(F) RESILTECH**



- 1. Short Company Introduction
- 2. SW Safety Analysis and DFA in Automotive
- 3. ResilTech Methodology
- 4. Feedback from application and future directions

# ResilTech Methodology – Overall View





# ResilTech Methodology – Input 1/4

| Input                       | Data from ISO26262-6 Annex D-Freedor<br>elemo                                    |                                                                                         |  |  |  |  |
|-----------------------------|----------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|--|--|--|--|
| SO 26262                    | Constitute the reference set of guideword Analysis and DFA.                      | ds to define <b>failure modes</b> in Safety                                             |  |  |  |  |
|                             | Timing and execution                                                             | Exchange of information                                                                 |  |  |  |  |
|                             | <ul> <li>blocking of execution</li> </ul>                                        | <ul> <li>repetition of information</li> </ul>                                           |  |  |  |  |
| Functional Safety           | deadlocks                                                                        | <ul> <li>loss of information</li> </ul>                                                 |  |  |  |  |
| Concept                     | • Livelocks                                                                      | <ul> <li>delay of information</li> </ul>                                                |  |  |  |  |
|                             | <ul> <li>incorrect allocation of execution time</li> </ul>                       | <ul> <li>insertion of information</li> </ul>                                            |  |  |  |  |
| Technical Safety<br>Concept | <ul> <li>incorrect synchronization between<br/>software elements.</li> </ul>     | <ul> <li>masquerade or incorrect addressing of information</li> </ul>                   |  |  |  |  |
| 1                           | Memory                                                                           | <ul> <li>incorrect sequence of information</li> </ul>                                   |  |  |  |  |
| SW Architecture<br>Design   | <ul> <li>corruption of content</li> </ul>                                        | <ul> <li>corruption of information</li> </ul>                                           |  |  |  |  |
| Design                      | <ul> <li>inconsistent data (e.g. due to update<br/>during data fetch)</li> </ul> | <ul> <li>asymmetric information sent from a<br/>sender to multiple receivers</li> </ul> |  |  |  |  |
|                             | <ul> <li>stack overflow or underflow</li> </ul>                                  | • information from a sender received by                                                 |  |  |  |  |
| 1                           | <ul> <li>read or write access to memory allocated</li> </ul>                     | only a subset of the receivers                                                          |  |  |  |  |
|                             | to another software element                                                      | <ul> <li>blocking access to a communication</li> </ul>                                  |  |  |  |  |
| <b>(F) RESILTECH</b>        | 75° Meeting of IFIP Working Group 10<br>24-28 January 2019 – Comp                |                                                                                         |  |  |  |  |

# ResilTech Methodology – Input 2/4





# ResilTech Methodology – Input 3/4





# ResilTech Methodology – Input 4/4



**Functional safety concept (ISO26262-2):** specification of the *functional safety requirements,* with associated information, their allocation to architectural *elements,* and their interaction necessary to achieve the *safety goals* 

| ÷ | ~ | $\sim$ | - | ~ | <u> </u> | $\sim$ | 0 |
|---|---|--------|---|---|----------|--------|---|
|   | S | U.     | 2 | 6 | 21       | b      | 2 |
|   |   |        |   |   |          |        |   |

Concept

Concept

Design

**Functional Safety** 

**Technical Safety** 

SW Architecture

**Technical safety concept (ISO26262-2):** specification of the *technical safety requirements* and their *allocation* to *system elements* for implementation by the *system* design

#### Software Architecture Design:

- (software) Architecture (ISO26262-2): representation of the structure of the *item* or *systems* or *elements* that allows identification of building blocks, their boundaries and interfaces, and includes the allocation of requirements to hardware and <u>software *elements*</u>
  - **Design (FP7 AMADEOS):** The process of defining an architecture, components, modules and interfaces of a system to satisfy specified requirement.

### **(F) RESILTECH**

# SW FMEA: Steps



### **(F) RESILTECH**

# SW FMEA: Granularity



### **(F) RESILTECH**

# SW FMEA: Failure Modes



# SW FMEA: Likelyhood



# SW FMEA: Likelyhood (example)



### **(F) RESILTECH**

# SW FMEA: Severity



**Status Classification** 

**(F) RESILTECH** 

# SW FMEA: Detectability



# SW FMEA: Risk Probability Number



# SW FMEA: Status Classification



## **(F) RESILTECH**

# SW FMEA: resulting table (examples)

| Ø     | SW      | componentino<br>componentrai | ure Mode<br>component railue Desci   | ption | kellood thet | cipio | severity des nites too           | andim | pat<br>etectolith | Intended Integation | is<br>stati |
|-------|---------|------------------------------|--------------------------------------|-------|--------------|-------|----------------------------------|-------|-------------------|---------------------|-------------|
| CPP-2 | C/C++   | Crash                        | No computations                      | High  | Application  | Yes   | Application fails                | High  | Acceptable        | AoU - Applications  | Transfe     |
|       | Runtim  |                              | performed/application                |       | crashes      |       | to contact the                   |       |                   | must report their   | rred        |
|       | e       |                              | crash. It happens when               |       |              |       | safety monitor.                  |       |                   | state to the Health |             |
|       | library |                              | the result has NULL input iterators. |       |              |       | Safety monitor<br>reports to MCU |       |                   | Monitor             |             |
| CPP-4 | C/C++   | Error in                     | Possible issues :                    | Low   | Wrong        | Yes   | Full validation and              | High  | Acceptable        |                     | Mitigat     |
|       | Runtim  | implementation               | * Stack not correctly                |       | execution    |       | code developed                   | U     | · ·               |                     | ed          |
|       | e       | of exception                 | unwound                              |       | flow,        |       | according to ISO                 |       |                   |                     |             |
|       | library | handling                     | * Exception not thrown,              |       | memory       |       | 26262 part 6.                    |       |                   |                     |             |
|       |         |                              | wrong exception thrown               |       | leaks        |       |                                  |       |                   |                     |             |
|       |         |                              | * Memory not available               |       |              |       |                                  |       |                   |                     |             |
|       |         |                              | for exception handling               |       |              |       |                                  |       |                   |                     |             |
| CPP-5 | -       | Input not                    | Dyanmic memory                       | High  | Computatio   | Yes   |                                  | High  | Acceptable        | AoU - Applications  | Transfe     |
|       | Runtim  | accepted                     | allocation fails. This can           |       | n not        |       | returned to the                  |       |                   | must handle error   | rred        |
|       | е       |                              | happen e.g. if dynamic               |       | performed    |       | application that                 |       |                   | status              |             |
|       | library |                              | memory fails within the              |       | and error    |       | can take                         |       |                   |                     |             |
|       |         |                              | RT                                   |       | code         |       | corrective                       |       |                   |                     |             |
|       |         |                              |                                      |       | returned     |       | measures.                        |       |                   |                     |             |
| CPP-8 | C/C++   | IEEE exception               | Executes incompletely                | High  | Computatio   | Yes   |                                  | High  | Acceptable        |                     | Mitigat     |
|       | Runtim  |                              | and returns error code               |       | n not        |       | returned to the                  |       |                   |                     | ed          |
|       | e       |                              |                                      |       | performed    |       | application that                 |       |                   |                     |             |
|       | library |                              |                                      |       | and error    |       | can take                         |       |                   |                     |             |
|       |         |                              |                                      |       | code         |       | corrective                       |       |                   |                     |             |
|       |         |                              |                                      |       | returned     |       | measures.                        |       |                   |                     |             |

### **(F) RESILTECH**

# ResilTech Methodology – Overall View





# SW Dependent Failure Analysis: Introduction

## Process



## The goal is

- to identify and analyze the possible common cause and cascading failures between <u>supposedly</u> <u>independent</u> software elements,
- to assess their risk of violating a safety goal (or derived safety requirements)
- to define new safety measures to mitigate such risk if necessary.

### Steps:

- Software component independence analysis
  - cascading failures analysis
  - common cause failures analysis
- In case of sub-elements with different ASILs
  - Software component coexistence analysis

## **(F) RESILTECH**

# SW DFA: Cascading Failure Analysis

### Process

It refers **exclusively to software** and it is organized in the following steps:

Cascading Failure Analysis



**Step 1.** A checklist to define failures that may propagate through a failure chain is identified. Each element is numbered with an ID.

**Step 2.** Identify couples of SW components to be checked for independence based on the requirements of independence (e.g., parallel elaboration with diverse algorithm). Each set is numbered with an ID.

**Step 3.** A guideword-based analysis is applied to each set, to understand the impact of such failures from a system-level point of view.

### **(F) RESILTECH**

# SW DFA: Cascading Failure Analysis: checklist

### Process

# **Step 1:** checklist (to be refined and tailored for each project)

#### - Some Examples:

| ( | Common Cause Failure<br>Analysis |
|---|----------------------------------|
|   | 7 (101) 010                      |
| - | Different No<br>ASIL?            |
|   | Yes                              |
|   | SW component                     |
|   | coexistence analysis             |

**Cascading Failure Analysis** 

| ID                                               | Category                                    | Element                     | Interpretation                                                                                                                                        |  |  |  |  |
|--------------------------------------------------|---------------------------------------------|-----------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|
| Timing_1                                         | Timing and execution                        | Block of Execution          | Does a block of execution of the SW component impact the destination SW components?                                                                   |  |  |  |  |
| Timing_3                                         | iming_3 Timing and execution Deadlocks      |                             | Are there potential situations where the SW component experiments deadlocks? (e.g. locking mutexes, waiting for the return value of a function, etc.) |  |  |  |  |
| Timing_5                                         | Timing and execution                        | Execution Time              | Is the SW component taking too much time to execute? Or is it too fast? Or starts at a wrong instant?                                                 |  |  |  |  |
| Memory_1 Memory Corruption of con                |                                             | Corruption of content       | Check the possible propagation of corrupted data from the source to the destination SW components.                                                    |  |  |  |  |
| Memory_3                                         | Memory                                      | stack<br>overflow/underflow | Check for possible stack overflow/underflow during memory usage                                                                                       |  |  |  |  |
| Information_1 Exchange of Information            |                                             | Information repetition      | Does a potential information repetition create a cascading failure in the destination SW components?                                                  |  |  |  |  |
| Information_2                                    | Exchange of<br>Information                  | Loss of data                | Does a potential loss of data create a cascading failure in the destination SW components?                                                            |  |  |  |  |
| Information_6                                    | nformation_6 Exchange of Incorrect sequence |                             | Does a potential incorrect sequence of information create a cascading failure in the destination SW components?                                       |  |  |  |  |
| Information_7 Exchange of Information Corruption |                                             | Corruption                  | Does a potential corruption of information create a cascading failure in the destination SW components?                                               |  |  |  |  |

### **(F) RESILTECH**

# SW DFA: Cascading Failure Analysis: resulting table (example)

| Ø | ID OF SW COMPO                                                                                | mentsunderf | analysis<br>analysis<br>netailine the dist<br>trailine pessingion                                 | ijte | inood system fall | JIE MY | ode leftectil<br>Selectivices mission                                                               | andim | act<br>tectability | Interded Integritor | 5<br>58 | S. S. With P. N. Dost mileston |
|---|-----------------------------------------------------------------------------------------------|-------------|---------------------------------------------------------------------------------------------------|------|-------------------|--------|-----------------------------------------------------------------------------------------------------|-------|--------------------|---------------------|---------|--------------------------------|
|   | (1 application<br>using C/C++<br>Runtime, with<br>certain<br>requiremetns on<br>independence) | Timing      | A completion or failure<br>indication is received too<br>late - A job hasn't<br>completed on time |      |                   | YES    | watchdog timer<br>expired to<br>indicate to the<br>applications that<br>the job hasn't<br>completed |       | Acceptable         |                     | Closed  |                                |

### **(F) RESILTECH**

# SW DFA: Common Cause Failure Analysis

### Process

It refers **exclusively to software** and it is organized in the following steps:

**Step 1**. A set of guidewords for events or root causes that may be cause of common failures of software elements is identified. Each keyword is numbered with an ID.

#### Step 2. Identify couples of SW components

- Typically, these are elements that:
  - Realize safety-critical functionalities through software diversity.
  - Are replicated software, running on the same hardware.
  - Implement redundant functionalities:
  - This item includes the redundancy of a safety mechanisms with respect to a target element.

**Step 3.** A guidewords-based analysis is applied to each set, to understand the impact of such failures from a system-level point of view.

## **(F) RESILTECH**

**Common Cause Failure** 

Analysis

# SW DFA: Cascading Failure Analysis

### Process

# **Step 1:** checklist (to be refined and tailored for each project)

### – Some Examples:



**(F) RESILTECH** 

## SW DFA: Common Cause Analysis: resulting table (example)



#### **(F) RESILTECH**

# SW DFA: SW Component Coexistence Analysis

### Process

- The goal is to check that lower ASIL component failures do not impact on higher ASIL components
  - It should be expected that the pair of components have been already investigated by the cascading failures analysis
  - In this case, the checklist for cascading failure is re-used
  - Output may be a new ASIL level for the SW component. In fact, status may have values:
    - No impact: failure of the lower-ASIL or QM component have no effect on the higher ASIL element
    - New ASIL: failure of the lower-ASIL or QM component propagates to the higher ASIL element , and a new evaluation of assigned ASIL is required
    - Architecture review: assigned ASILs are not changed but architecture is reviewed.

### **(F) RESILTECH**

Different ASIL?

SW component

coexistence analysis

Yes

## SW DFA: Common Cause Analysis: resulting table (example)



#### **(F) RESILTECH**



- 1. Short Company Introduction
- 2. SW Safety Analysis and DFA in Automotive
- 3. ResilTech Methodology
- 4. Feedback from application and future directions

# Feedback from application 1/2

- Positive
  - Having a clear method to follow
    - <u>To verify the completeness of the Safety Requirements and Mechanisms</u> <u>and also Assumption of Use in particular in case of SEOOC</u>
    - <u>Standardize requirements for supplier: most important for long supply-</u> <u>chain as in automotive</u>
  - Having a guideline on selective application of Safety Mechanisms (run-time)
  - Good acceptance from Quality Departments



### **(F) RESILTECH**

# Feedback from application 2/2

- Negative
  - Effectiveness of analysis highly depend on detailed SW architecture design
    - Typically not available when it should
  - Once a potential safety impact is found it is not always straightforward to motivate usage of on-line error detection and mitigation techniques versus process oriented solutions (e.g. "improve" SW testing)



### **(F) RESILTECH**

- Developing low complexity modelling facilities
- Running model execution to evaluate "severity" prior to SW development
- Connection with Fault Injection campaign to validate "detectability" post development
- Formalized / semiformalized SW architecture would allow to be input for semi-automatic analysis
  - Despite a number of tools and methodologies available in last decades adoption from industry is still far from becoming a common practice

### **(F) RESILTECH**



# Questions and (hopefully) Answers

