Our client is a leading web3 firm that offers a cutting-edge, user-friendly solution that combines industry-leading security features with a powerful, intuitive interface in today's fast-paced digital economy, managing your cryptocurrency assets with security and ease. Their platform and wallet empower you to store, send, and receive a wide range of digital assets effortlessly. Built with advanced encryption protocols to ensure your assets are always protected, giving you peace of mind in a constantly evolving market. They are presently expanding their business and looking for an experienced Site Reliability Engineer to join their exchange team
Â
About the Role
As an SRE Lead, forming and managing the SRE team will form part of the mandate. You will also need to establish a unified incident response system and promote a no-responsibility review and systematic improvements.
Â
Key Responsibilities
Strategy and Governance
Team and Organization
Cross-team collaboration, working with R&D, architecture, DBA, network, security, legal/compliance, to drive the inclusion of reliability goals in the roadmap and KPIs.
Platform and Engineering Implementation
Exchange Scenario Special Project, like end-to-end latency SLI, matching confirmation and replay, serial number consistency and idempotence, isolation of hot trading pairs.
Multi-chain node operation and maintenance, congestion and reorg handling, MPC/HSM, risk control, and approval flow for coin withdrawal and deposit, closed loop for reconciliation errors.
Security and Compliance: Audit of sensitive operations, meeting requirements such as SOC2/ISO 27001/PCI-DSS.
Â
Requires Skills & Experience
Over 8 years of experience in back-end/platform/operation and maintenance engineering, over 4 years of SRE or production engineering experience, and over 2 years of team management/leadership experience.
Having successful cases of stability governance and incident handling in high-concurrency and low-latency businesses (transactions/payments/advertising/large-scale real-time systems).
SLO/SLI and incorrect budgeting practices, observability system construction (Prometheus/Grafana/ELK or similar, OpenTelemetry, Tracing).
Kubernetes/Service Mesh, microservice gateway (Nginx/Envoy), CI/CD (GitHub Actions/GitLab CI, etc.), GitOps (Argo CD).
Design and implementation of progressive delivery (Canary/Batch/feature Switch) and automatic rollback strategies.
Data and Storage: MySQL/ Sharding/Replication and Failover, Redis/Kafka, Backup and Disaster Recovery Drills; Consistency and reconciliation thinking.
Performance and Capacity Engineering: Stress testing, benchmarking, analysis, and tuning (flame diagram /CPU/GC/ Network /TCP kernel parameters, etc.).
Event management: SEV grading, IM/IC command, cross-team collaboration and communication, writing high-quality retrospectives, and tracking action items.
Â
Preferred Experience
Experience in exchange/matching/payment clearing and settlement/operation and maintenance of securities firms or crypto wallets and chain nodes.
Experience in implementing anti-ddos, WAF, Bot management, rate limiting, and traffic governance systems.
Experience in compliance systems (SOC2, ISO 27001, PCI-DSS, SOX-class controls), security audits, and evidence retention.
Experience in multi-region GSLB, cross-cloud/multi-cloud architecture, Chaos engineering and GameDay organization.
Go/Java optimization experience, practical experience in messaging systems (Kafka/RocketMQ/Pulsar) and storage (TiDB/Vitess/Citus/TDSQL, etc.).
Have experience in cost optimization and FinOps..
Â
If this outstanding opportunity sounds like your next career move, please submit through "Apply Now" or send your resume in Word format to Luke Wang at resume.sg@pinpointasia.com and put SRE Lead - Top tier Crypto Exchange - J12354 in the subject header.
Â
Data provided is for recruitment purposes only.



