Tensorflow

24 Mar 2017

Tensorflow

I had the distinct privilege last night of being completely in over my head! I attended a Charleston Data Science meetup at the Iron Yard (and my first ever meetup). Aside from meeting some great local programmers and data scientists, and enjoying delicious pizza and beer, I was completely blown away by how much I have to learn! I was super excited when Taylor, representing the company that bought us said pizza and beer, talked for a few minutes about his interest in CRISPR, because it was always one of my favorite topics with which to try and impress my biology students. Having just jumped into a new field in the last year, and finding myself in a room full of people who know a lot more about data than I do, it was a treat to get to share in a topic that is right up my alley.

Eric’s talk on deep learning and tensorflow was really interesting. The deep learning material was way over my head and a good reminder and bit of motivation to try to catch up on stats. It’s got to be a good thing to see this stuff as much as possible though, even if I’m not understanding it. When he moved toward his discussion of tensorflow, I thought I was going to be completely lost, but I actually grabbed on to an interesting point he made that I was somewhat familiar with. Since I’ve spent the last month or so trying to wrap my head around Apache Spark and learn some Scala programming, I’ve also familiarized myself a bit with the concept of the DAG that Spark’s DAG Scheduler creates and executes. What I thought I knew about Spark was that the functional programming component of Scala allows for Spark to create a DAG with any number of desired transformations so that they are optimized and lazily executed. Tensorflow, written in C++, also builds a DAG of desired transformations. I think I need to go back and relearn what I thought I knew about lazy execution, because I was under the impression that a functional language like Scala or Haskell would be necessary for this functionality, but C++ has no functional programming characteristics. I asked Eric about it, and he said that DAG creation and optimization can be achieved simply with a certain level of object-oriented design and doesn’t require functional programming. Now I know!

And, now that I think more about it, I think the key contribution of Scala’s functional programming to Spark isn’t the ability to build a DAG, but rather it’s the concept of resilience. Spark’s distributed datasets (RDD’s) are Resilient, because a functional programming language, Scala in this case, is able to recompute any chunks of data that are lost in nodes that malfunction. So functional programming is more about the actual distribution of data than the optimization of transformations.

So what I’m happy about is that a talk that I was completely unprepared for turned out to help me better understand what I’m working on. I’ll definitely be back.