We have prepared a series of interviews with graduates of the EPAM training center in the area of Data Quality (DQ). Not only were they trained at the EPAM Lab, but they also received a long-awaited job offer. The guys told us what Data Quality is, why they chose this area, what difficulties they encountered during training, and gave valuable recommendations of useful materials for newcomers.
You can read the first two interviews here and here. And today we will talk with Valentin Tarasov, EPAM Data Quality Engineer.
What is Data Quality?
In a narrow sense, it is Quality Assurance (QA) in data-related projects, where the methods of traditional QA and testing are not always effective. In a broad sense, it is a set of practices at the intersection of QA, software development and analytics, the purpose of which is to ensure the quality of data sufficient for further use by a customer.
Why are Data Quality experts needed and what do they do?
DQ engineers assure customers that their data, which is the foundation of any Data project, is of sufficient quality. If, for example, there is trash at the input to the machine learning model or bad input data is used for analytics and forecasting, then the result of such analytics will also be poor.
Why did you choose the area of Data Quality?
I have always been interested in working with data, but there are many specific areas of such work: from Scala development to Business Intelligence. For me, DQ is an opportunity to combine different roles on Data projects. If desired, you can do more in development: write frameworks to check the quality of data or embed your code in CI / CD pipelines of developers. Or you can take on more analytics and communication with the customer: find out what is really needed from the data, how it will be used, offer your solutions. Data Quality is multifaceted, and many perspectives are open in this direction.
Why did you choose the EPAM training center?
I didn’t know anything about the company, but I heard good reviews about the training center from friends. The training center is the gateway to EPAM. All positive aspects of the training - care and interest in students, individual approach, lively participation of lecturers and mentors, their enthusiasm, responsibility, and openness - are revealed even more in the company.
What knowledge and education did you have before studying at the training center?
I was the head of a small IT department. We were engaged in web activity of a chain of stores. But this is not what is needed to start training in DQ. I think the main thing is responsible attitude to study, willingness to study constantly (even after graduation from the training center), and interest in working with data.
How does the training on Data Quality go?
Students watch video lectures, read additional materials, do homework, and undergo intermediate knowledge screenings. At all stages of training, the student has a mentor. The mentor's goal is to train the student and help him grow to the level of a confident junior. Then he can easily be employed at the company. I myself act as a mentor and I can say that mentoring is a very strong side of DQ training.
What hard and soft skills did you acquire during the training?
In addition to DQ itself, during the training you can learn basic knowledge of working with Linux, Python, SQL, database architecture, QA. And even if there is already some knowledge in these areas, the training helps to systematize it, put it in your head and give you the opportunity to practice it. The second, more advanced part of the training is devoted to Big Data technologies. Here I learned everything from scratch: the basics of Hadoop and distributed computing, Hive, Spark, Kafka, ELK stack, and so on. From soft skills, I would note the ability to communicate openly. To make a mistake or ask again if you don't understand something is much better than to remain silent and not understand.
What difficulties did you face during your studies and how did you cope with them?
The biggest difficulty is the uncertainty that the training will be completed successfully and ends with employment. But it was the desire to become part of EPAM that motivated me to take on more and not give up during temporary problems.
What project are you working on at EPAM now and what are you doing?
I work on a project for a large pharmaceutical company. I participated in all stages of the project, from planning to successful weekly releases. The project gave me a great opportunity to be the lead of a small team of QA and DQ engineers and understand that I like it despite all the difficulties and loads.
What are the perspectives for a Data Quality Junior at EPAM?
The main perspective is to find a project where you will work with exactly that aspect of DQ that you like. Machine learning, Artificial Intelligence, clouds, streaming, IoT - in all this needs care of data quality.
And what's next?
EPAM is developing rapidly. All roads are open. And any, even the most ambitious goal, can be achieved. The main thing is that the growth criteria in EPAM are absolutely transparent: you tell your manager who you want to become, he answers what stages on this path you have to go through, and the speed of movement depends only on yourself.