NGI Incident Response Procedure

A good coordination of the incident response procedures needs to involve a number of different teams: local site administrators, Grid services administrators (if different from the previous), people responsible for the security of the national network infrastructure -IRIS-CERT- and Grid coordination -NGI Security Officer-.

The security procedures herein described have been reviewed to improve the communication between network and site administrators and NGI computing resource administrators.

This review involved people representing the NGI responsible and the IRIS-CERT.

Actors

Schema of the security structure in the NGI Environment and in the EGE Envirornment. [NGI-ES CSIRT: Structure of the CSIRT ]

Network Infrastructure Level

IRIS-CERT: Computer Emergency Response Team for RedIRIS (Spanish Research & Educational Network)

Spanish NGI Level

Spanish NGI CSIRT: This role is played by IRIS-CERT, reachable through the mail address cert@rediris.es, responsible for the security coordination inside the NGI. If you are contacting IRIS-CERT by mail, you will have to add [NGI-INCIDENT] in the subject line, this subject will allow IRIS-CERT to identify quickly the kind of incident, which it is going to be handled.
Spanish NGI Security Officer (NSO): A single person responsible for the security coordination inside the NGI. It MUST be a member of the NGI CSIRT. In our case the rol is played by Carlos Fuentes, member of IRIS-CERT and EGI CSIRT

Site Level

Local Site Managers (LSM): People responsible for the management of the networking, security and computing resources at a site.
Grid Site Managers (GSM): People responsible for the management of the grid services and resources at a site.
grid Site CSIRT (gCSIRT): A group of people (at least 2), reachable through the mailing list (grid-sec@…site…. -this mailing list is recomendable to be able to reach easily someone within the site when we are in front of a security incident), responsible for the security incident support on grid resources. It MUST exist in every site and MUST include at least one Grid Site Manager.
grid Site Security Officer (gSSO): A single person responsible for the security coordination on grid resources. It MUST exist in every site and MUST be a member of the grid Site CSIRT.

Grid site CSIRT and grid site security officers can be found at the GOC database portal

EGI Incident Response Procedure

In the following link, you can find available the EGI Incident Response Procedure, where there are described the recommended steps, which a grid site must follow in the case, they are involved in a security incident within one of the Grid host of his institution. We extremely recommend to read the document.

Below we resume briefly the steps a grid site should take, when an incident is discovered, having in mind that this procedure is adapted to the SWE situation.

Inform immediately your local security team (LSM) and the NGI CSIRT, sending a mail to:
- cert@rediris.es
- the subject line [SWE-INCIDENT] will have to be added,
- This step MUST be completed within 4 hours after the incident has been detected,
- In the EGI Incident Response Procedure document you can find the needed templates to inform about the incident to your LSM and NGI CSIRT, this templates can also be found in the following link http://osct.web.cern.ch/osct/incident-reporting.html.
In case no support is shortly available, whenever feasible and if admitted by your local security procedure if you are sufficiently familiar with the host/service to take responsibility for this action, try to contain the incident. For instance by unplugging the network cable connected to the host. Do NOT reboot or power off the host.
Assist your local security team (LSM) and NSO to confirm. If needed announce the incident to all the sites via site-security- contacts@mailman.egi.eu. This step MUST be completed within 4 hours after the incident has been detected.
If appropriate:
- Report a downtime for the affected hosts on the GOCDB (https://goc.gridops.org/)
Perform appropriate forensics and take necessary corrective actions
- Identify and kill suspicious process(es) as appropriate, but aim at preserving the information they could have generated, if possible both in memory and on disk
- If it is suspected that some grid credentials have been abused or compromised, you MUST ensure the relevant accounts have been suspended
- If it is suspected that some grid credentials have been abused, you MUST ensure that the relevant VO manager(s) have been informed. VO contacts are available from: https://cic.gridops.org/index.php?section=vo
- If it is suspected that some grid credentials have been compromised, you MUST ensure that the relevant CA has been informed. CA contacts are available from: https://www.eugridpma.org/showca
- If needed, seek for help from your local security team or from NGI Security Contact (cert@rediris.es) or from abuse@egi.eu
- If relevant, additional reports containing suspicious patterns, IP addresses, files or evidence that may be of use to other Grid participants SHOULD be sent to site-security- contacts@mailman.egi.eu. Never send potentially sensitive information (ex: IP addresses, usernames) without clearance from your local security team and/or your NGI Security Contact.
- Throughout step 5, requests from the Operational Security Coordination Team MUST be followed-up within 4 hours.
Coordinate with your local security team and the NSO to send an incident closure report within 1 month following the incident, to all the sites via site-security- contacts@mailman.egi.eu, including lessons learnt and resolution.

When contacted, all recipients of site-security-contacts@mailman.egi.eu are expected to take appropriate action, including processing the information available (ex:suspicious log entries, DN, or IP addresses), checking locally for signs of compromise, and reporting suspicious findings.

Use Cases & Scenarios

Given the actors defined above, we considered the following use cases:

An incident is discovered by a Local Site Manager
- The LSM identifies it as a Grid incident and escalates the alert to the gCSIRT and start to follow the EGI incident response procedure and its local one. In the case the LSM identifies the incident as a not grid break, it should be running its normal procedure.
- gCSIRT together with the NSO will evaluate the incident, and will study how to report it to site-security-contacts@mailman.egi.eu,
- the gCSIRT will take the appropiate actions to shut down the problem, if help is required, the gCSIRT will be able to get it from the NGI SO and abuse@egi.ei
- As soon as the incident is resolved, the gCSIRT will notify the incident clousure to NSO, and to site-security-contacts@mailman.egi.eu
An incident is discovered by a Grid Site Manager
- The GSM alerts immediately the site gCSIRT (the gSSO receives the alert as part of the gCSIRT)
- gCSIRT escalates the alert to IRIS-CERT (cert@rediris)- NSO receives the alert, as we commented IRIS-CERT is playing the NSO role- and to site-security-contacts@mailman.egi.eu and starts to follow the EGI incident response procedure.
- If the incident may affect the integrity of the local computing infrastructure, the gCSIRT alerts the LSM
- The LSM escalates the alert to the IRIS-CERT and starts to follow the IRIS-CERT incident response procedure in coordination with the gCSIRT.
An incident is discovered from outside the site
- A generic incident is reported to the LSM. The LSM will receive the alert and will trigger if it's related to the Grid or not. If LSM is in front of an security problem related to the Grid, it will be informing to the gCSIRT, in this case the the flow will be the same as 1. above from point 2. If LSM is handling a non-related grid issue, it will have to follow its internal security flow.
- The gCSIRT is notified about a security problem related to the site. In this case the flow will be the same as 2. above from point 2