MapR Talk

#Mapr Talk March 22, 2016 - Tedd D

cooccurence vs matrix factorization

result dithering
anti-flood

explotation, diversity, speed - not the last fraction of a percent. exploratoion meaning the recommendation results today are the traning data for tomorrow. So if you make the results worse today but the training data better tomorrow - you get a better overall result.

Dithering used to reorder recommendation results - reordering is done randomly. Dithering is guaranteed to make offline-performance worse. Dithering also has a near perfect record of making actual performance much better. Hours to implement - non intuitive/academic - very pragmatic.

log of rank + gaussian noise and reorder. Pick noise scale to provide desired level of mixing.

Exploring the second page - no click through rates after around ~20th (boundary of first page). rank because of order of presentation.

floor(t/T) as a seed - stable for a time period. - feeling of novelty.

Lesson 1 Exploration is good

Bayesian bandits - based on thompson sampling. Very general sequential test. near optimal regret trade-off exploration and exploitation

POssibly best known solution for exploration/exploitation - incredibly Simple

Thompson Sampling: Select each shell according to probability that it is the best. Instead of estimating from probability distrbution - let’s just sample instead of estimate. Picks alternatives with the probabilities we want.

COnverges faster than epsilon-greedy. (with Gamma-Normal prior). His graph showed outperformance by ~ 4x. Refer to An Empirical Evaluation of Thompson Sampling (Chapelle and Li 2011). - asymptotically optimal - around 10 lines of code.

Lesson 2: Exploration is pretty easy to do and pays big benefits.

Thompson sampling is used by most ad targeting systems - need to explore to get better training data - kind of over academically because it has an optimality result.

The Problem

K means clustering is useful for feature extraction or compression.
At scale and high dimenion thedesirable number of clusters increases.
Very large # of clusters may erquire more asses through the data.
Super-linear scaling generally impossible

The Solution

Sketch based algos produce a sketch of the data
Streaming k-means uses adaptive dp-means to produce this sketch in the form of many weighted centroids which approx theorig distribution.
The size of the sketch grows slowly with increasing data size
Many operations such as clustering are welll behaved on sketches
Fast and Accurate k-means for large datasets (wong/meyerson/shindler)
Revisitng k-means via bayesian nonparametrics (jordan)
Lots of clusters are fine
Avoids the k-means failure of two seeds being fixed next to each other
Sketch with k log k centroids claimed

Lesson 3: Sketches make big data small

Example 4: Search Abuse. Recommendation: Sparse data around a person, but lots of data from lots of people - learn patterns from crowd extended to a person.

E.g. Alice got an apple nd a puppy. Charles got a bicycle. Bob got an apple. - We recommend Bob a puppy.

What if everyone get’s a pony including amelia - now what do we recommend? Nothing we cant tell :(.

Cooccurrence Analysis is great for this. Different from text search, recommendation search based on using as indicators using cooccurence gives us better results!

Lesson 4: Recursive search abuse pays. Search can implement recs which can implement search.

Example 5: Something useful? Like something to do with money.

Common Point of Compromise:

Scenario - Merchant 0 is compromised, leaks account data during compromise, fraud committed elsewhere during exploit, high background level of fraud, limited detection rate of exploits.

Goal: Find merchant 0.

Metagoal: Screen algos for this task without leaking sensitive data.

Simulation Strategy: Pick consumer parameters such as transaction rate, prefs - no details of fraud/fraudsters/transactions/models! Then simulate using compromised period and then have higher probability of fraud during exploit probability. How can this be good? Solution: Simulate failure modes of the real system.

Parametric Simulation: Parametric matching of failure signatures allows emulation of complex data properties. Matching on KPI’s and failure modes is pretty good.

e.g. Hive was broken - given schema + skew of data - generated something to break query.

Summary

We live in the golden age of newly achieved scale. Scale has lowered the tree - hard problems are much asier, lots of low-hanging fruit all around us.

Cheap learning has a huge value.

Code available at github.com/tdunning

Written on March 22, 2016