Location: Columbus OH
Company Name: Express
Occupational Category: 15-1132.00,Software Developers, Applications
Date Posted: 2020-02-12
Valid Through: 2020-03-13
Employment Type: FULL_TIME
The Senior Machine Learning Data Engineer is responsible for transforming data (structured and unstructured) into business value by designing, building and implementing end to end data and machine learning products. This includes all the processes from data collection, cleaning, and preprocessing, to training models and deploying them to production. Reporting directly to the Director of Decision Science and Analytics, this role will work very closely with Data Scientists and IT teams to ensure efficient predictive and prescriptive analytics algorithm developments and deployments that requires big data usage. The ideal candidate will be passionate about artificial intelligence and stay up-to-date with the latest developments in the field.
• Build distributed, scalable, and reliable data pipelines that ingest and process data at scale and in real-time to feed machine learning algorithms
• Develop, test, validate and deploy machine learning algorithms to drive customer segmentation, personalization and product recommendations, demand forecasting, planning and allocation and omni-channel operations
• Build and automate machine learning scoring algorithms for automated decision making and other real-time applications
• Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using big data technologies.
• Harmonize, transform, and move data from a raw format (both structured and unstructured) to consumable, curated views
• Build analytics tools that utilize data pipelines to provide actionable insights into customer acquisition, customer retention, operational efficiency and other key business performance metrics.
• Develop front-end interface software for web applications or mobile devices to expose data and machine learning outputs for business user interactions
• Work with stakeholders including the Loyalty, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
• Collaborate with Customer Data Hub and IT teams to automate customer predictive models and integrate them into business operating systems for effective deployment
• Bachelor's degree in Computer Science, Computer Engineering, Programming, Management Information Systems, Statistics, Mathematics or related field
• 5+ years' experience with data warehousing/Data lakes/Big Data technologies including one or more of the following or similar: Hadoop (i.e. HBASE, Hive, MapReduce, Sqoop, Spark, Kafka, etc.), Netezza and/or DataStage (ETL).
• 3+ years of previous experience in software engineering related to quantitative models
• 3+ years of experience with building and deploying machine learning methods such as Decision Trees, Random Forests, Gradient Boosting, Neural Network, NLP and text analytics
• 3+ years of experience munging/wrangling data to create workable datasets from messy or noisy data sets.
• Familiarity with machine learning and parallel processing pipelines, experience with implementation in low-latency real-time platforms and/or scalable offline batch processes
• Strong hands-on experience in Spark, Scala, Python, C++, and/or Java
• Strong intuition on how to architect scalable distributed computing systems for machine learning
• Experience developing complex SQL and database views in a large data warehouse environment.
• Strong scripting experience using Python/Bash in Linux/UNIX environment to process and analyze large data sets
• Experience with NoSQL databases, including Postgres, Cassandra or MongoDB
• Experience with data processing techniques such as dimensionality reduction, normalization, imputation, and feature extraction.
• Knowledge of web-scraping and related techniques
• Experience with cloud computing
• Experience supporting and working with cross-functional teams in a dynamic environment
• Ability to professionally manage multiple priorities with minimal supervision and on schedule
• Ability to communicate technical subject matter to non-experts
• Strong desire to always be learning, and a collaborative team-player attitude
• Master's or PhD degree in Computer Science, Computer Engineering or related field
• Spark, Hadoop Hive, Scala, Kafka, Python and R.
• Expert level experience in machine learning methods such as Decision Trees, Random Forests, Neural Network, Gradient boosting, NLP and text analytics.
• Experience Google Cloud computing platform and its capabilities
• Experience working with modern tools in the Agile software development life cycle - Version Control Systems (Ex. git, github, Stash/BitBucket), Knowledge Management (Ex. Confluence, Google Docs), Development Workflow (Ex. Jira), Continuous Integration (Ex. Bamboo, Jenkins), Real Time Collaboration (Ex. Hipchat, Slack)
• Familiarity of job scheduling technologies such as Control M, Airflow, Jenkins, etc