# Infra Support Engineer

> Fuku · Kuala Lumpur, Malaysia · Full-time · Posted 2026-04-21

**Workplace:** on_site

## Description

Infra Support Engineer – GMI Global Infrastructure Team  
  
Preferred Location:  
\- Taiwan  
\- Malaysia  
  
Responsibilities:  
\- Provide first and second-line technical support to customers for AI Infrastructure, including GPU/CPU nodes, networking, storage, orchestration, and platform services. Support is delivered via ticketing systems, emails, Slack, or other messaging platforms.  
\- Support GPU cluster delivery, including system provisioning, image deployment, network validation, BIOS/firmware updates, and GPU driver/runtime installation.  
\- Monitor system health and service-level indicators using alerts and dashboards; respond to alerts 24x7 as scheduled.  
\- Triage incidents by gathering context, verifying scope and impact, and following standard operating procedures and runbooks to perform immediate mitigations.  
\- Escalate incidents to global SRE engineers with clear, concise incident notes and relevant logs/traces.  
\- Maintain incident logs, update status pages, and communicate timely updates to stakeholders during incidents.  
\- Perform routine operational tasks such as log checks, health checks, capacity checks, and simple automated fixes.  
\- Participate in postmortems and contribute actionable follow-ups to reduce recurrence of incidents.  
\- Help maintain and improve standard operating procedures (SOP), run periodic runbook validation, and document new procedures.  
\- Work collaboratively with developers and SRE teams to improve system reliability.  
  
Qualifications:  
\- Bachelor’s degree in Computer Science or a related field.  
\- Over 2 years of experience in IT operations, server administration, SRE, DevOps, or technical support.  
\- Hands-on Linux experience, including shell, kernel, and log management.  
\- Basic networking knowledge, including TCP/IP, DNS, HTTP, and VLANs.  
\- Familiarity with monitoring, alerting, and logging tools such as Prometheus, Grafana, and AlertManager.  
\- Experience with Nvidia GPU infrastructure and Kubernetes.  
\- Comfortable collecting diagnostics, reading logs, and interpreting traces.  
\- Strong troubleshooting mindset and ability to follow runbooks under pressure.  
\- Excellent written and verbal communication skills for customer-facing incident handling.  
\- Willingness to work shifts and participate in on-call rotations.  
\- Bilingual in English and Chinese.

## Apply

[Apply at Fuku](https://apply.workable.com/fuku/j/03C688AFC9/apply)

---
Powered by [Workable](https://www.workable.com)
