It was an interesting day, attendance was around 130, with a single-track, and I attended most of the talks, before having to head off to fly home, fortunately the final two talks were recorded, so I’ll hopefully get a chance to see them soon.
First up was Richard Conway (@azurecoder), this started off with a discussion about the work Microsoft has put into getting Hadoop to run on Windows (specifically in Azure), but the last quarter covered Spark, the syntax for transformations is a considerable improvement over Hadoop, but he did say that performance is still lacking a bit behind the more highly tuned Hadoop, they’re working on it tho’
The quick-fire approach continued with Javier Ramirez (@supercoco9) on “API Analytics with Redis and BigQuery”, the Redis part fell on the floor somewhere, so this was mostly about being able to parse 1Tb of data a second, and how Google’s BigQuery works, this was probably more interesting to BigData folks than to me.
Next up was Alex Dean (@alexcrdean) “Why Your Company Needs a Unified Log”, for me at least, this heated things up a bit, Alex founded Snowplow, the folks behind “Snowplow Analytics”, which has https://github.com/snowplow/snowplow as it’s main project/product.
He covered the history of Business Analytics, from traditional Data-Warehousing via nightly ETL, right up to today’s must-have-more-data-right-now!
Snowplow is an event tracking system, that logs events to various locations, but one of the most interesting things from this for me was this http://snowplowanalytics.com/blog/2014/03/11/building-an-event-grammar-understanding-context/ and https://github.com/snowplow/snowplow/wiki/snowplow-tracker-protocol with “standards” for defining fields in logged events, Kafka was discussed at length as a mechanism for recording these historical logs, and the ability to playback segments of logs for event modelling was highlighted, and they’re able to process from AWS Kinesia too, probably worth watching the video.
I took advantage of the conference discount and bought Alex’s book, and I’ve enjoyed what’s there so far…
We continued with Richard Astbury (@richorama), who presented Microsoft’s “New Orleans” project, which implements “Silos” and “Grains”, Grains are created per entity to be tracked, and run in a clustered container (a Silo), during the talk, I kept thinking about how similar this sounded to Stateless Session EJBs and Erlang’s processes, after the talk, someone referred to it as Erlang.NET, and indeed, Robert Virding asked why they didn’t use Erlang despite this, it was actually an interesting talk, for anybody wanting to run a lightweight process per entity (psmgr?), defining a lightweight process lifecycle, including things like “suspend”.
The example cited was the online support for Halo 4 where there was a Grain per player recording the player stats, and unlike previous versions, it didn’t fall over on launch day.
Next up was Stephane Maldini (@smaldini) on “Reactive Micro-services With Reactor”, this was a bit rambling, but covered Reactor, which is a “foundation for asynchronous applications on the JVM” and is an alternative to the Microsoft-supported “Reactive extensions” and reactive event processing, this was interesting, tho’, perhaps could’ve been a bit more focussed on practical uses, he touched on reactively generating “Back Pressure”, a topic we’d return to later.
We only had a couple of lightning talks after lunch, first up was a chappie from IBM talking about Bluemix, the eye-catching demo combining Node-Red (https://github.com/node-red/node-red) with Cloud translation and Cloud “question answering” services was impressive, tho’, whether or not the people who would be most comfortable using visual programming techniques like that would actually do that is debatable, it looked good tho’
The second lightning talk was from Steve Alexander (@now_talking), about how organisations should be divided, along Dunbar’s Number-lines (http://en.wikipedia.org/wiki/Dunbar’s_number), this was a brief talk, delivered in Steve’s own style, and was an interesting wee diversion, for a better summary see this tweet.
Next up was Phil Wills’ (@philwills) talk “Scaling TheGuardian.com with Scala, Elasticsearch and more”, I’d previously watched Graham Tackley’s talk on this so, Phil’s talk wasn’t quite as new, main lessons were “separate your services” and “don’t allow a slow-service to bring down the rest of your site”, this was more a look at the evolution of the Guardian site over the past twelve years (while Phil’s been there).
Following on from that was Michael Nitschinger (@daschl) talking about “Building A Reactive Database Driver On The JVM”, at the start Matt Revell made a point of saying that it wasn’t a “Vendor Talk” (Michael works for Couchbase, who provided a lot of the organisational manpower), and to be honest, that was a disservice to Michael’s talk, it was a really interesting look at how they rebuilt the Couchbase Java driver, to support both synchronous and asynchronous styles (synchronous calls to the async layer), how they managed to build “Back Pressure” into the driver, and a look at the underlying architecture of something that most folk would naively think was “just a driver”, this was really quite interesting, and the slides are here.
Next up, was “Staying Agile In The Face Of The Data Deluge”, from Martin Kleppman (@martinkl), coincidentally, I’d purchased his up-coming book a couple of weeks ago, which is still in the early stages, and his talk was fascinating, this was about servicing requests (not necessarily HTTP requests) using Stream Processing, primarily building on-top of Kafka (for volume), and using the Apache Samza project (which he’s a committer on) for doing it, again, familiar topics came up, “Back Pressure”, streaming transformations, event histories.
I can recommend watching the video of this when it’s available, he’s uploaded his slides already (the final slide, with references is well worth a look).
The final talk I got to see before I left, was Elliot Murphy (@sstatik) “Safeguarding sensitive data in the cloud”, I’m pleased that Heroku cover most of his recommendations, but this was another really interesting talk, he compared the food industry, where people who don’t follow basic hygiene rules are considered negligent, saying that even doing the simple things (washing your hands etc) results in saving a lot of people from illness, to data-security, where a lot of people don’t follow the “basics”, he covered “hand washing” in data-security terms, and some useful guidelines on changing culture around the security of data, again, a really interesting talk, and the slides are here.
Unfortunately I had to cut and run before the last two talks, including Robert Virding talking about the history of Erlang, but I’ll watch out for the videos.
It was a nice opportunity to catch up with some old Canonical friends, and hopefully, they’ll find the energy to run it again next year.
Apparently I bought ticket #1 back in June