Recommendation systems are among the most popular applications of machine learning. The idea is to predict whether a customer would like a certain item: a product, a movie, or a song. Scale is a key concern for recommendation systems, since computational complexity increases with the size of a company’s customer base. Spark MLlib enables building recommendation models from billions of records in just a few lines of Python (Scala/Java APIs also available).
Spark MLlib implements a collaborative filtering algorithm called Alternating Least Squares (ALS), which has been implemented in many machine learning libraries and widely studied and used in both academia and industry.
from pyspark.mllib.recommendation import ALS # load training and test data into (user, product, rating) tuples def parseRating(line): fields = line.split() return (int(fields), int(fields), float(fields)) training = sc.textFile("...").map(parseRating).cache() test = sc.textFile("...").map(parseRating) # train a recommendation model model = ALS.train(training, rank = 10, iterations = 5) # make predictions on (user, product) pairs from the test data predictions = model.predictAll(test.map(lambda x: (x, x)))