Poltergeist Statistics: Correlation Coefficient with pandas and numpy

As a data scientist in training, I get to do a lot of exploratory analysis these days, examining different variables in data and see how they may be related. There is a nice little trick you can do with data to understand this relationship which comes from a magical field of statistics.

Imagine that as a part of your education at Hogwarts, you need to measure how ghosts and the messines at Hogwarts are related. You buy some sort of ghost metronome at the Diagon Alley to measure the present of the bodiless undead. As your messiness measure, you break into the Filch’s office and steal his meticulously collected instances of messiness he had to clean up at Hogwarts.


Strata, Mindless tasks and why I am in data science

This week I went to Strata conference in New York and returned from the conference with the warm confirmed feeling that is where I belong, just like when I moved from Idaho to Washington a year ago. So far working on HoloLens team has being an exciting ride learning things about software engineering, myself and others. Throughout this journey I realized something very important that made the transition to data science only more meaningful.


Features do not mean use

As software engineers, we build many features daily. We deploy to production, write dozens of tests, prepare reports… A lot of busy work sometimes that often does not transition to value. And while we do this, our companies pay us quite a bit of cash. I do not know about you, but I feel an extreme pang of guilt and imposter syndrom-y feeling when I am doing something at work that I know will unlikely bring value to the team and the company. Working on a couple anti-features in all the companies I have worked in made me ask a question. Is there a better way? Can I spend my time on something that will bring the information forward and help the team make a decision?

That is when I started my realization that the answer is in the data. Pure engineering with few data justifications that this is the right feature to build is moot.


