Site Reliability Developer 4 | Oracle | InterviewCat Jobs - テック企業エンジニア厳選求人検索

業務内容

Ensure availability, scalability, and operational excellence of Oracle Cloud Infrastructure (OCI) Japan Sovereign Cloud services; translate operational and business requirements into reliability plans and execute improvements via tooling, automation, runbooks, and process changes.
Design and implement automation to reduce toil and improve MTTR; own and prioritize the SRD backlog based on shift feedback, incident reviews, alert quality reviews, and business reliability needs.
Lead complex incident investigations, perform root-cause analysis, and drive preventive actions; coordinate cross-team response and communicate findings.
Collaborate with development teams to improve operational readiness and reliability of services; mentor less experienced engineers and contribute to continuous improvement initiatives.
Participate in 24x7 shift rotation, providing technical leadership during critical service events and ensuring timely incident response and documentation.
Improve alert quality, reduce noise, and maintain robust runbooks and reliability-related documentation; balance business requirements with technical feasibility and risk.

Linux system administration and performance optimization
Proficiency in one or more programming languages (Java, Python, Go, C++, or similar)
Experience with cloud platforms, infrastructure automation, observability/monitoring, and incident response practices
Troubleshooting of cross-functional production issues and root-cause analysis
Networking and storage fundamentals relevant to cloud infrastructure

Experience leading 24x7 on-call rotations and incident management
Technical mentorship and cross-team collaboration across JP/EU Sovereign Cloud teams
Familiarity with alerting improvements, runbook automation, and documentation standards
Ability to translate business needs into reliable, scalable solutions; strong communication in bilingual environments (Japanese/English)

大規模クラウド運用の中核を担い、日本の Sovereign Cloudにおける信頼性設計と実装をリードする機会。
SRD backlogの所有・優先付け、複数チームとの横断協働を通じて、技術リーダーシップと組織横断的影響力を強化。
24x7の運用とインシデント対応を経験し、メンタリングを通じてチームの成長を推進。JP/EU間のベストプラクティス共有により、グローバルな信頼性文化の醸成に寄与。