MapR Talk
#Mapr Talk March 22, 2016 - Tedd D
cooccurence vs matrix factorization
- result dithering
- anti-flood
explotation, diversity, speed - not the last fraction of a percent. exploratoion meaning the recommendation results today are the traning data for tomorrow. So if you make the results worse today but the training data better tomorrow - you get a better overall result.
Dithering used to reorder recommendation results - reordering is done randomly. Dithering is guaranteed to make offline-performance worse. Dithering also has a near perfect record of making actual performance much better. Hours to implement - non intuitive/academic - very pragmatic.
log of rank + gaussian noise and reorder. Pick noise scale to provide desired level of mixing.
Exploring the second page - no click through rates after around ~20th (boundary of first page). rank because of order of presentation.
floor(t/T) as a seed - stable for a time period. - feeling of novelty.
Lesson 1 Exploration is good
Bayesian bandits - based on thompson sampling. Very general sequential test. near optimal regret trade-off exploration and exploitation
POssibly best known solution for exploration/exploitation - incredibly Simple
Thompson Sampling: Select each shell according to probability that it is the best. Instead of estimating from probability distrbution - let’s just sample instead of estimate. Picks alternatives with the probabilities we want.
COnverges faster than epsilon-greedy. (with Gamma-Normal prior). His graph showed outperformance by ~ 4x. Refer to An Empirical Evaluation of Thompson Sampling (Chapelle and Li 2011). - asymptotically optimal - around 10 lines of code.
Lesson 2: Exploration is pretty easy to do and pays big benefits.
Thompson sampling is used by most ad targeting systems - need to explore to get better training data - kind of over academically because it has an optimality result.
The Problem
- K means clustering is useful for feature extraction or compression.
- At scale and high dimenion thedesirable number of clusters increases.
- Very large # of clusters may erquire more asses through the data.
- Super-linear scaling generally impossible
The Solution
- Sketch based algos produce a sketch of the data
- Streaming k-means uses adaptive dp-means to produce this sketch in the form of many weighted centroids which approx theorig distribution.
- The size of the sketch grows slowly with increasing data size
- Many operations such as clustering are welll behaved on sketches
- Fast and Accurate k-means for large datasets (wong/meyerson/shindler)
- Revisitng k-means via bayesian nonparametrics (jordan)
- Lots of clusters are fine
- Avoids the k-means failure of two seeds being fixed next to each other
- Sketch with k log k centroids claimed
Lesson 3: Sketches make big data small
Example 4: Search Abuse. Recommendation: Sparse data around a person, but lots of data from lots of people - learn patterns from crowd extended to a person.
E.g. Alice got an apple nd a puppy. Charles got a bicycle. Bob got an apple. - We recommend Bob a puppy.
What if everyone get’s a pony including amelia - now what do we recommend? Nothing we cant tell :(.
Cooccurrence Analysis is great for this. Different from text search, recommendation search based on using as indicators using cooccurence gives us better results!
Lesson 4: Recursive search abuse pays. Search can implement recs which can implement search.
Example 5: Something useful? Like something to do with money.
Common Point of Compromise:
Scenario - Merchant 0 is compromised, leaks account data during compromise, fraud committed elsewhere during exploit, high background level of fraud, limited detection rate of exploits.
Goal: Find merchant 0.
Metagoal: Screen algos for this task without leaking sensitive data.
Simulation Strategy: Pick consumer parameters such as transaction rate, prefs - no details of fraud/fraudsters/transactions/models! Then simulate using compromised period and then have higher probability of fraud during exploit probability. How can this be good? Solution: Simulate failure modes of the real system.
Parametric Simulation: Parametric matching of failure signatures allows emulation of complex data properties. Matching on KPI’s and failure modes is pretty good.
e.g. Hive was broken - given schema + skew of data - generated something to break query.
Summary
We live in the golden age of newly achieved scale. Scale has lowered the tree - hard problems are much asier, lots of low-hanging fruit all around us.
Cheap learning has a huge value.
Code available at github.com/tdunning