Bayside Solutions

DevOps (Cloud-AI)

in Cupertino, California

New Job

Job Description Job Attributes+

  • Req ID

    25328_1778702042

  • Job Category

    IT

  • Job Type

    Contract

  • Hourly Salary

    From $0 to $0

  • Job Location

    Cupertino, California
    United States

Overview

DevOps (Cloud-AI)

W2 Contract

Pay Rate: $55 - $65 per hour

Location: Cupertino, CA - Remote Role

Job Summary:

We are looking for a highly motivated DevOps / Site Reliability Engineer to support large-scale Kubernetes-based infrastructure and platform operations. This role is focused on building, automating, and operating highly reliable systems that power critical engineering platforms and services.

Duties and Responsibilities:

  • Design, build, automate, and support scalable Kubernetes-based platforms and services
  • Operate and troubleshoot production environments running at scale
  • Develop automation and tooling to improve operational efficiency and reliability
  • Monitor platform health, performance, and availability using observability tooling
  • Troubleshoot infrastructure, application, and networking issues across distributed systems
  • Work closely with engineering teams to improve deployment, reliability, and scalability practices
  • Participate in operational support, incident response, and root cause analysis
  • Improve CI/CD workflows and deployment automation
  • Drive operational excellence through documentation, automation, and process improvements
  • Take ownership of projects and independently drive deliverables to completion

Requirements and Qualifications:

  • Strong hands-on experience with Kubernetes platforms such as EKS, GKE, AKS, or similar
  • Experience running and supporting applications on Kubernetes at scale
  • Strong understanding of containerized infrastructure and distributed systems
  • Experience with monitoring and observability tools, preferably Grafana and Prometheus
  • Experience with CI/CD pipelines and deployment automation
  • Experience with Splunk logging, log analysis, and troubleshooting
  • Strong scripting and automation experience using Python and/or Golang
  • Experience troubleshooting production systems under pressure
  • Strong communication and collaboration skills
  • Self-starter mentality with strong ownership and accountability

Preferred Qualifications

  • Experience operating Ray clusters/services
  • Strong networking and troubleshooting experience
  • Experience with cloud infrastructure and platform services
  • Experience with Infrastructure as Code and automation frameworks
  • Experience supporting high-scale production systems
  • Familiarity with SRE principles and operational best practices

Bayside Solutions, Inc. is not able to sponsor any candidates at this time. Additionally, candidates for this position must qualify as a W2 candidate.

Bayside Solutions, Inc. may collect your personal information during the position application process. Please reference Bayside Solutions, Inc.'s CCPA Privacy Policy at www.baysidesolutions.com.

Saved Jobs

    © 2026 Bayside Solutions. All Rights Reserved. Privacy Policy. Powered by Adverto Inc.