Mayank Malik

Technology Analyst (Big Data) at ZS Associates

Pune Area, India

Apache spark Big Data Fundamentals Oozie Hive Impala Hadoop Python
"My objective is to learn and explore my potential while making a significant contribution to the success of my employer.”

• Around 4 years of professional experience in Hadoop, Java, Python, AWS and Big Data technologies.
• Expertise in Hadoop and related projects including MapReduce, YARN, Spark, Hive, Pig, HBase, Sqoop, Oozie, Impala, Sentry, etc. inApache's, as well as Cloudera's Distribution.
• Expertise in Cloudera Distribution (Manager 5.x.x, Navigator, Director).
• Managed Hadoop clusters that include setup, install, monitor, and maintain
• Experience of developing a real-time project and various client POCs using Hadoop on Cloudera.
• Well versed with Hadoop Security including Sentry and Kerberos

Implemented Data-warehouse solutions on Hadoop.

Technology Analyst - Big Data at ZS Associates

Oct 2014 - Present

I am working for various engagements: Extract the data of various sources from Oracle DB using Sqoop into HDFS on Hadoop •Automation of data migration process using Python script •Process the source data which is in the form of CSV, CSV GZIP •Perform data validations and create Hive external tables, Impala parquet tables with snappy compression based on the requirement •Data analysis on different datasets using Impala queries using various joins •Implementation of Hadoop data security model using Sentry , LDAP and ACLs •Used Sentry to assign role-based permission levels and table level security •Use of Amazon Web Services (AWS) including S3, EC2, EMR, EBS, AMI etc. •Implemented HDFS snapshot and restore process, HIVE metastore backup process, archival of data into S3 and then into glacier. •A module to developed to make the data available to business users with specific requirement. The process of Create/Delete/Assign Roles Project Area was automated. Daily data backup, logging, Sentry Authorization were implemented.

Systems Engineer(Big Data) at Infosys

Jul 2012 - Aug 2014

Automation using Big Data/Hadoop framework Automated an end to end process of data ingestion from multiple sources into HDFS and applying business logic on top of it and publishing processed data to MySQL table for business user review. Use of Core Java along with the implementation of Hadoop,HDFS,MapReduce. Scheduling of actions through Oozie,use of Hive to run queries ,process data and use of Sqoop to push/pull data to/from Mysql and Teradata. Hadoop Cluster setup engagement Inhouse 3 Nodes cluster was setup and components like Hive,Pig,Sqoop,Zookeeper,Oozie were installed. Infosys Certified Hadoop Developer


Bachelor of Engineering (B.E.), Computer Science at Chitkara University

Dec 2007 - Dec 2011

Push pixels...
Shovel coal into server...
Create mockups...
Defend the wall...
Draft storyboard...
Disrupt an industry...
Achieve profitability...
Become a unicorn...
Become Batman...