article-spots
article-carousel-spots
programs
Hard skills

Trend on Data Quality: who and why needs to get into this domain now

16 Dec 2021

Valentin Tarasov, the Data Quality Engineer, told us about the specific world of this domain at EPAM, different views on Data Quality, and the necessary skills if you start in the profession from scratch.  

Tell us about the Data Practice at EPAM: what is it like, what are the directions within it? 

This is a good question because Data Practice at EPAM is a separate world within the company. The specific feature of our practice is that, for example, we have Business Analysts in other practices as well. They perform some common and some different tasks. Our practice deals specifically with Data projects that are related to Data. Often, we work with colleagues from other practices, other countries, and other areas. If, for example, we are responsible for the backend, another EPAM team may be responsible for the frontend, and we will work together as a cross-practice.

There is an opinion that Data is very difficult, and often newcomers pass by this direction. What can you say about that? 

I wouldn't say Data is hard. Yes, you must be inclined to work with data. As they say, have an analytical mindset when you think about numbers or analytics that are expressed in numbers. You also have to be able to get into the essence of data: what kind of data came to you, what's behind the numbers, and what you have to do to verify it. 

If you like the numbers, you understand them, and you understand what’s behind them, then working with data fits you 

What’s Data Quality then? 

There are two approaches to Data Quality (DQ): simple and more complex. On the one hand, this is data testing: you need to make sure that the customer or the team that will use data is sure that this data is correct: it did not "break down" on the way and comes from where it needs to. On the other hand, Data Quality is especially important. If data is of poor quality, then why collect it? Here we are talking about how to make data display what is needed, to make it clear why this data is collected and where it will be used. 

Let’s talk about everyday tasks. What will the guys who want to become Data Quality engineers do? 

There are many different projects at EPAM. Depending on what stage the project is at, you will have different tasks. 

At the beginning of a project, you will hardly need to write a code. You will be doing a lot of reading, communicating, and trying to understand what a customer really needs and what will count as quality data. Your daily routine is meetings. Together with the Business Analyst, you figure out the data quality requirements. And then you formalize all the information into a test plan or strategy. 

Then comes the second, main, phase of quality gates, data quality checks. You already have the requirements for data. Based on those requirements, you write a code that checks data to see how well it fits those requirements. The code can be in SQL, Python, or Java. For example, it’s often important that the code is well readable so that it can be understood not only by your peers, but also by the customer side, for example.  

And then the third stage: support. When everything is already set up, invented, and checked daily, and you make sure that nothing breaks, you monitor it. And if something doesn't work, you figure out why and at what stage, and then you figure out how to fix it. 

Data Quality Engineer is a complex profession. It combines a Data Engineer, an Analyst, and a Tester 

On what projects at EPAM are Data Quality specialists in demand?  

The range of projects at EPAM is large, and so is the demand for DQ specialists. Of the fields, I would single out fintech, telecommunications, the oil and gas sector, Internet marketing, and life sciences, including medicine and pharmacology. Working with Big Data is the main thing that unites these projects. 

What are the opportunities for a junior at EPAM?  

I suggest looking at a global perspective. The growth of IT all over the world is enormous, especially considering the dramatic changes of last year. EPAM is growing at an accelerated pace, with many new projects and many new employees joining the company. And so, there are many opportunities for growth. In general terms, at EPAM you can become, for example, a Lead Engineer, a Manager, or a Solution Architect. There are competency matrices for each role. If you know what you want to become, our Resource Manager will tell you how. In a fast-growing company in a fast-growing market, you can reach all sorts of peaks.  

Why are juniors needed in Data Quality?  

Right now, in Russia, it is a big problem to find ready-made DQ specialists. It will either be testers who have had something to do with data but do not have an analytical approach to their work. Or it will be Data Engineers who want to work in DQ but have gaps related to testing methodology and documentation. And if you take the right candidate and train him or her to be a DQ engineer, the right person ends up in the right place at the right time. 

High-quality data is needed to make the right business decisions 


Tell me about the training at the training center: what it includes, and what future students will do?

The main requirement when selecting candidates for DQ training is their internal motivation to finish the training and become a DQ Engineer. As for hard skills, we will teach them.

The first part of the training is about Data Quality and the basic skills needed in the job. We’ll tell you what DQ is, what common is, and what the differences are between DQ, testing, and Quality Assurance (we can say that DQ is part of Quality Assurance). We teach SQL as well as Python as a basic programming language. Then we teach the basics of working with databases and their architecture. We teach basic Linux administration skills and how to work in the clouds. We teach how to work with GitHub. Classes mostly consist of self-study of materials, discussion with a mentor, and homework. This is suitable for those who either have little experience or want to systematize their knowledge. 

The second part is the Lab, where mentors teach Big Data technology. Data Practice in Russia deals with projects involving Big Data. This work involves the use of certain technologies, clusters, applications, and clouds. It is clear that even a person with IT experience will not always have experience with Big Data and clouds. For training in the Lab, we have chosen the most popular technologies, such as Spark, Hive, ELK, Kafka, and others. 

How does the Data Quality Lab differ from other labs? 

By mentoring. Each student in the Lab has their own mentor. The mentor focuses on the student’s current level of knowledge and tries to pull it up. Even if you are just starting out, you still have a chance to get into our Lab, finish it, and get a job at EPAM. The mentor will do whatever it takes to make you grow and pass the exam successfully. The mentor is primarily interested in teaching you and you become his colleague. 

What knowledge and skills does a future student need to pass an interview?  

It’s not hard skills that will be evaluated at the entrance interview. First, the candidate must show the interviewers his or her desire to learn. He or she may demonstrate an understanding of what he or she will do in DQ practice. Secondly, English. There are hundreds of projects at EPAM that require English. And it's not just that you can't communicate your message. The main goal is to understand people, their speech, and ideas, as well as to be understood by others. 

Hard skills will be tested at the technical interview. It will be necessary to show knowledge of the basics of Python and SQL. SQL is a universal language for working with data. It is used by Data Engineers, Business Analysts, and clients. It's also good to understand Linux at a basic level, be interested in IT in general, and have a broad outlook in this area. In DQ, you can't learn one tool and use it all the time. Every new project brings a modern technology stack. And knowing the tools that are currently being used in the industry is quite important. 

How long does the training last, and when can I count on employment?  

During training, we do not adhere to rigid frameworks. When you come for training, you are given a deadline of one and a half to two months. But these deadlines are negotiable. If you learn quickly (recently, a student completed a quarter of the course over the weekend and solved all the homework), – it’s great! If you can't keep up, we will meet you halfway. The main thing is not to disappear and to stay in touch. On average, the first part lasts two months. And after that, the Big Data Lab goes on for another two to three months. 

How soon can you get to a work project at EPAM?  

There is a huge increase in the number of projects at EPAM now. Therefore, at any time during training, you can get a job. A project appears, and you have the right skills—then you go through interviews and work. But it doesn’t free you from training. It's just that now you will have two tasks: working and studying. 

Why is it worth entering Data Quality right now?  

The most attractive thing about this domain to me is its versatility. You have a wide range of tasks, and you can find a project that you like better. For example, if you like Data Analysis, you can work more with a Business Analyst and gather information about data, build tables, and figure out what's behind data. If you like programming more, you write a good framework that works well, and everybody understands it. If you like testing, Quality Assurance, you can write thousands of autotests and exhaustive documentation. You can try yourself in various roles and areas: in fintech, machine learning, and Data Science. 

After graduation, you will become the one who is needed anywhere 


What advice can be given to those who are just looking at the direction?  

My recommendation is to be motivated. If you are, you will be able to overcome all the temporary difficulties you encounter during your studies. For a while, you may have to put everything aside and focus on getting new knowledge and doing your homework. And in order not to lose your goal, not to quit halfway, it's good to have good motivation right away to successfully finish your studies and get to work with us.