Data Science and R from Tatjana Kecojevic’s perspective - ENTER Conference
1478
post-template-default,single,single-post,postid-1478,single-format-standard,ajax_fade,page_not_loaded,,qode_grid_1300,qode-theme-ver-10.1.1,wpb-js-composer js-comp-ver-5.1,vc_responsive

Data Science and R from Tatjana Kecojevic’s perspective

Dr. Tatjana Kecojevic is a longtime R user with a doctorate from Statistics from the
University of Manchester. She has spent many years working in U.K. higher education as a
Senior Lecturer and has a comprehensive research record in area of quantile regression. She is
a cofounder of DataTeka a company dedicated to helping people better understand and make
sense of their data through thorough and insightful training strategies. She founded RLadies-
Manchester and coorganises RLadies Belgrade and Montenegro Groups. Her interest is in the
field of statistical modeling, statistical computing, statistical computer education and everything
related to #rstats. More about Tatjana and what she does you can find on her website:
tanjakec.github.io

 

How long have you been using R? Has it changed since then?

R is a dialect of the S language and first appeared in 1996, when the statistics professors Ross
Ihaka and Robert Gentleman of the University of Auckland in New Zealand released the code
as a free software package. I was introduced to R in 2003 when I started my PhD at Manchester
University. In my first year I had to do several modules from a MS program in Statistics and we
were required to use R for some of the projects. It was also an experiment for the Maths School
who hadn’t used it before and the staff running the course wanted to see how the MS students
were going to respond to it. At the time we had to rely on S-PLUS help in order to learn how to
use R. One of the first handouts I used was Linear Mixed Models: Appendix to An R and S-
PLUS Companion to Applied Regression by Joh Fox
(http://www.stat.rutgers.edu/home/yhung/Stat586/Mixed%20model/appendix-mixed-
models.pdf). I loved the idea of having to write the code in order to do statistical modelling and
produce fancy graphs. I got automatically hooked with the fact how accessible it is to
experiment and explore the data with R, reflecting and mirroring the way in which you think
about a particular statistical problem.
Much has changed since those early day and today R has matured into most sophisticated data
analysis programs. It has become one the most popular tool for data science. It is used by major
companies such Google, Facebook, Amazon to name a few.
The appearance of RStudio (an integrated development environment for R
) has helped making R more accessible not just for statisticians for whom it was ultimately
created, but people who are interested in other aspects of data science and its application.

 

Why have you decided to use R over SAS or SPSS when it comes to data processing?

R enables you to escape from the restrictive environments and sterile analyses offered by
commonly used statistical software packages. It is unique among programming languages in
that it has statistics and data built into its DNA. The R system has an extensive library of
packages that offer state-of- the-art- abilities. Many of the analyses that they offer are not even
available in any of the standard packages. That’s not all, you could use R to build your own
packages that could extend the core R system. The functionalities implemented in R such a
data manipulation, data analysis and visualisation are incomparable. It enables easy
experimentation and exploration, which improves data analysis. R is the tool behind reporting
modern data analyses in a reproducible manner, making an analysis more useful to others
because the data and code that actually conducted the analysis can be made available.
Last, but not least, The R community is one of the best features. Supported by the R
Foundation for Statistical Computing and with the strong and open engagement of developers
and users from all walks of background from science to commerce it is hard to imagine that any
commercial corporation will be able to develop sustainable business model with the same
innovative drive and power as the R community.

The collaboration amongst statisticians and other scientist who are engaged with statistical
computing and growing interest and engagement of large companies facilitates an altruistic R
community. This is the driving force that enables R to play a leading role in the field of data
analytics and data science in general. As a result, this community driven developmental
approach creates a more powerful R resource making it more usable and attractive to data
scientists and analysists.

 

If someone wants to learn R, is it necessary to have some basic knowledge about
statistics?

 

What you are most likely to learn using R is how to think about data and how to solve problems
using the tools of data science. RStudio makes R easier to use, and it also enables the creation
and rendering of plain-text documents that contain embedded R code and data fostering
research transparency and replicability of results. However, you might not be interested in data
analysis but rather use R for building your website or writing a book. The possibilities are
endless. All in all, you do not need to be a statistician or even interested in learning statistic to
use R. R is a tool that could be used for many aspects of data science cycle which don’t
necessarily touch on statistics.

 

What’s your perspective on Data Science and how does it influence on business and
society?

 

The new vast amount of data we have begun to take more and more notice of, has given a rise
to the new discipline of data science. Growing demand of data volume and easy
understandability of extracted knowledge and insights from data is the motivating force of data
science. With the explosion of “Big Data” problems, data science has become a very hot field in
many scientific areas as well as marketing, finance, and other business and social study
disciplines. Hence, there is a growing demand for business and social scientific researchers
with statistical, modelling and computing skills. We can now identify patterns and regularities in
data of all sorts that allow us to advance scholarship, improve the human condition, and create
commercial and social value. Using DS we can talk today about using precision medicine
enabling medical practitioners to identify which treatment and prevention strategies for a
particular disease will work in which groups of people. The improvement of people’s lives by
utilising the exciting potential of data is a fundamental motivating factor and is why I do what I
do.

 

You lived in Manchester for a long time. Then you founded DataTeka company. You even
started R ladies community there. What are your professional plans, now that you came
back to Serbia again?

 

After almost twenty years in academia I took one of the best decisions of my life: to leave it.
I find the challenges and pace of running my own company to be a more stimulating
environment as have more opportunity to interact with a wide variety of clients and fellow data

science enthusiasts. I very much enjoyed teaching, but I felt like I needed to challenge myself in
a different way.
R Ladies in Serbia and Montenegro is an ideal platform for me to engage with all R users and
employ all of my skills. To a large degree I am motivated by the need to help to develop a
vibrant R user community, and in particular address and develop more opportunities for women
within R user groups and the data science community.

 

What platforms and channels do you use to stay in touch with trends and novelties in
your profession?

 

It’s always a pleasure attending and presenting at R conferences such as UseR! This year I’ll be
presenting at ERum which will be in Budapest between 14-16 May. Attending conferences is a
great way of learning about the new trends and innovations and exchanging the practice and
ideas with actors in the field.
As I have mentioned earlier R has a very strong community. I have found that the best way to
keep myself informed about new trends and developments is through engagement with the R
community.
I’ll name a few ways in which you can get engaged straight away:
– #rstats hashtag( #rstats hashtag): a responsive, welcoming, and inclusive community of
R users to interact with on Twitter;
– R-Ladies (R-Ladies): a world-wide organization focused on promoting gender diversity
within the R community, with more than 60 local chapters;
– Local R meetup groups (Local R meetup groups): Face-to- face meet-ups for users of all
levels are incredibly valuable;
– RWeekly (Rweekly): an incredible weekly recap of all things R;
– R-bloggers (R-bloggers): an awesome resource to find posts from many different
bloggers using R;
– DataCarpentry (DataCarpentry) and Software Carpentry (Software Carpentry): a
resource of openly available lessons that promote and model reproducible research;

This year you are going to do a workshop “Responding to analysis and communication:
Data science the R way” during the ENTER conference. Can you tell what will
participants learn during the workshop?

I want people to have better understanding of the power that data can provide and help them
make better decisions, also remove the taboo about statistics being a complex and complicated
discipline for the privileged few. This is an old-fashioned view that does not belong in the era we
live in. With this in mind we will introduce the participants to available R tools needed in a typical
data science project, which will be illustrated in a small case study. We’ll exchange ideas on
how we think about exploring data science and the best ways of communicating the results with
the target audience. I hope people will find it informative and fun.