We implemented Amazon EKS to manage the majority of HR – Tech company’s workloads and core data flows, handling over 80% of the platform’s containerized operations in the ap-south-1 region. This included the primary job search and recruitment microservices which we designed as the backbone of user interactions, data processing, and integrations with AWS services like S3 for resume storage, DynamoDB for job profiles, Lambda for event-driven notifications, SQS for async workflows, SNS for pub/sub messaging, SES for transactional emails, RDS for reporting, CloudFront for CDN delivery, and Elastic Load Balancing for traffic distribution. We configured EKS’s data plane with managed node groups using EC2 instances for persistent compute-intensive tasks like job matching algorithms and Fargate profiles for serverless bursty workloads like resume parsing, ensuring scalability across multi-AZ private subnets within the VPC.
We managed critical workloads including job search and candidate profile services, communication services handling email/SMS/WhatsApp notifications, event tracking systems, general queuing services, email processors, and webhook handlers. All deployments maintained multiple replicas for high availability with Horizontal Pod Autoscaler scaling based on CloudWatch CPU and memory metrics from Container Insights. Traffic flowed securely through CloudFront edge caching to Elastic Load Balancing which distributed requests across EKS services before reaching backend storage layers like DynamoDB, RDS, and S3.
We aligned HR – Tech company’s AWS environment under our AWS Organization with management account ID 1xxxxxxxxx3 where Service Control Policies enforced mandatory resource tagging, denying EC2 launches without required tags. In HR – Tech company’s member account ID 4xxxxxxxxx6, we tagged all EKS data plane nodes which are EC2 instances in managed node groups consistently with our enterprise tagging strategy. This supported cost allocation through Cost Explorer, compliance monitoring via AWS Config rules, and operational automation across all services.
Environment tags set as “Production” for prod nodes or “Staging” for test environments enabled lifecycle management like auto-termination of dev resources after 30 days. Project tags identified as “Shine-EKS” facilitated Cost Explorer grouping for the recruitment platform core infrastructure. Owner tags with “devops@HR – Tech company” ensured accountability for audits and incident response. CostCenter tags as “Recruitment-Platform” aligned billing requirements with SCP policies. Additional tags included WorkloadType: JobMatching for service categorization, Compliance: GDPR-Compliant for data handling standards, and AutoScaleGroup: Shine-NodeGroup-Prod linking to scaling policies.
We verified EKS node tagging using the command aws ec2 describe-instances –filters “Name=tag:Project,Values=Shine-EKS”. Tags applied automatically via CloudFormation TagSpecifications in AWS::EKS::Nodegroup resources met SCP requirements for non-empty string values on mandatory keys. We achieved 100% compliance validated quarterly through AWS Config rule ec2-instance-tagging-required, GuardDuty exception detection, CloudTrail audit trails, and CloudWatch Logs retention. WAF protected application endpoints from exploits while Secrets Manager handled automatic credential rotation and KMS ensured data encryption at rest across S3, RDS, and EKS etcd.
We captured deployment outputs from the production EKS cluster running Kubernetes v1.29 in the shine-prod namespace after confirming all replicas ready and HPA scaling policies active. Key deployments included bff-deployment and candidate-backend-deployment in prod-candidate namespace both showing 1/1 ready status, ai-search-email-consumer and email-consumer in prod-communication-service namespace with high availability replicas, recruiter-email-processor in prod-email-processor namespace at 1/1 ready, event-tracker-deployment in prod-event-tracker namespace scaled to 4/4 replicas, and generalqueingservice-deployment at 1/1 ready status.
All deployments are configured with multiple replicas for high availability where event-tracker-deployment auto-scales to 10 replicas during peak loads via CloudWatch HPA. Pods used consistent labels like app: shine-recruitment and version: v1.2.3 with resource requests of 500m CPU and 1Gi memory. Fargate profiles handled serverless workloads like authentication services while EC2 nodes using t3.medium instances hosted stateful services with EBS volumes accessed via Route 53 private hosted zones. No deployment failures detected with monitoring showing less than 1 restart per hour across CloudWatch, GuardDuty, and WAF security layers.
We automated EKS infrastructure changes using Terraform for provisioning VPC configurations, EKS clusters, EC2 node groups, RDS instances, S3 buckets, Lambda functions, and IAM roles. Jenkins CI/CD pipelines orchestrated application deployments where container images built and pushed to ECR automatically triggered updates to Kubernetes manifests and Helm charts applied to EKS. Rollback strategies included Terraform state reversion for infrastructure and Kubernetes Deployment rollbacks to previous ReplicaSets triggered by failed health checks. Production changes prohibited AWS Management Console access with all operations enforced through pipelines monitored by CloudWatch alarms.
For storage and database layers we deployed DynamoDB for sub-millisecond job profile queries, S3 with KMS encryption for resume and asset storage, and RDS with multi-AZ failover for enterprise reporting. Messaging infrastructure utilized SQS for decoupled async workflows, SNS for cross-service pub/sub notifications, and SES for high-volume transactional emails to recruiters and candidates. Security stack included GuardDuty for continuous threat detection across VPC Flow Logs and CloudTrail, WAF rulesets protecting web endpoints, CloudTrail capturing all API activity, Secrets Manager for credential rotation, and KMS for envelope encryption. Networking comprised VPC with multi-AZ private/public subnets, CloudFront global CDN caching, Route 53 DNS with health checks, and Elastic Load Balancing for intelligent traffic routing. Observability leveraged CloudWatch for metrics and logs with Container Insights, Athena for S3 log analysis, and Cost Explorer for spend optimization across all services.
This comprehensive EKS-centric architecture enabled HR – Tech company to deliver job search latency under 200ms during peak concurrent user spikes, achieve 99.99% uptime through multi-AZ deployments and automated failover, realize 40% infrastructure cost savings via EC2 Spot Instances, Fargate serverless scaling, and Cost Explorer optimizations, while maintaining zero security breaches with GuardDuty blocking over 1200 threats monthly and full GDPR compliance through KMS-encrypted data flows and CloudTrail audit capabilities. Deployment cycles reduced from weeks to under one hour with incident response times dropping from 30 minutes to 5 minutes enabled by CloudWatch alerting and automated remediation workflows.