Design, implement, and maintain automation to improve reliability of OCI Japan Sovereign Cloud services; own and prioritize SRD operational-improvement backlog; drive reliability improvements and lead complex incident investigations.
Partner with development teams to improve operational readiness; translate operational and business requirements into reliability plans; develop tooling, automation, runbooks, and process changes.
Participate in 24x7 on-call rotation and provide technical leadership during critical service events; mentor less experienced engineers; collaborate with JP/EU Sovereign Cloud teams to share practices and align reliability improvements; contribute to continuous improvement initiatives.
技術スタック
必須スキル
Linux system administration
Python
Reliability Engineering / SRE practices
Cloud platforms, infrastructure automation, observability, monitoring, and incident response
Strong troubleshooting and root-cause analysis
Ability to participate in 24x7 shift and provide on-call leadership
Native Japanese and business-level English
歓迎スキル(該当する場合)
Java, Go, C++, or similar programming languages
Experience with Sovereign Cloud / OCI and cross-region collaboration