Founder at Nube Technologies, a startup focused on helping business make better decisions through better data. By linking and matching different mentions of an entity through machine learning and Apache Spark and Hadoop, we enable credit scoring, fraud analytics, compliance (AML, KYC, OFAC etc), data wrangling and data quality, master data management, 360 views, data governance, catalog and product data management etc. This is through our AI based fuzzy matching, entity resolution and deduplication technology
Founder, Nube Technologies
Jun 2010 to Present
Nube's product Reifier is an AI based fuzzy matching engine built over Apache Spark and Hadoop to find similar entities in business data for data quality, data governance, Master Data Management(MDM), Client Relationship Management (CRM), Data Warehousing, cross selling, lead management, 360 view of data, ETL, fraud analytics like AML, security and compliance like KYC. Reifier identifies near duplicate data and links structured and semi structured records in CRM, product catalogs and other sources for a consolidated 360 view. It collates existing client, customer, vendor or product lists in different formats and systems, many of which are near duplicates. Each record typically has multiple fields, some of which may be absent in some systems, partially or poorly populated in others, and not matching exactly. Reifier's advanced proprietary machine learning and big data algorithms discover these linked records and near duplicates. Check www.nubetech.co for more information. Nube also provides niche consulting in big data adoption, strategy, data analytics, nlp and machine learning. Some past projects 1. Crux Reporting for HBase https://github.com/sonalgoyal/crux 2. HIHO for Hadoop ETL https://github.com/sonalgoyal/hiho 3. Cascading job flows for semi structured data and creation of a data warehouse using Hadoop. 4. Design consulting for a petabyte scale user enrollment, authetication and reporting system. 5. Design of cloud based email archival and indexing system 6. Map Reduce for network analysis. 7. Data deduplication and similarity ranking using Map Reduce. 8. Creation of custom AMIs for EC2 with Hadoop and Hive. 9. Implementation of UDFs for Apache Hive and deployment on AWS Elastic Cloud Compute. 10. Advertising Campaign Monitoring and Performance
Hadoop, Hive, HBase, Cassandra, NoSQL, Cloud Computing, AWS consultant , Self Employed
Apr 2006 to May 2010
Technical Lead , BabyPackets
Jun 2003 to Mar 2006
I work on the Voice VPN product which provides specialized VOIP networks and advanced user preferences
Sr Associate Technology , Sapient
Apr 2001 to Apr 2003
Worked on different customer projects involving EAI, Content Management Systems, Web Based Custom Software. Involved in architecture and design, implementation, performance tuning, team management, testing infrastructure and troubleshooting tasks.
Analyst , Etrade
Mar 2000 to Mar 2001
Engaged in rollout of ETrade Sweden and ETrade HongKong websites along with related backend.
Analyst , Webtek Software (A Dresdner Kleinwort Wasserstein subsidiary)
Dec 1997 to Dec 1999
Worked on the year end and monthly reporting solution, GAAP adjustments and import and export of data for consumption of the banking division.
Bachelor's Degree, Chemical Engineering, 8.17 , Indian Institute of Technology, Delhi
Dec 1993 to Dec 1997
It was nice informative session.