Scalable Collaborative Filtering with Spark MLlib

http://databricks.com/blog/2014/07/23/scalable-collaborative-filtering-with-spark-mllib.html

Recommendation systems are among the most popular applications of machine learning. The idea is to predict whether a customer would like a certain item: a product, a movie, or a song. Scale is a key concern for recommendation systems, since computational complexity increases with the size of a company’s customer base. Spark MLlib enables building recommendation models from billions of records in just a few lines of Python (Scala/Java APIs also available).

Spark MLlib implements a collaborative filtering algorithm called Alternating Least Squares (ALS), which has been implemented in many machine learning libraries and widely studied and used in both academia and industry.

from pyspark.mllib.recommendation import ALS

# load training and test data into (user, product, rating) tuples
def parseRating(line):
  fields = line.split()
  return (int(fields[0]), int(fields[1]), float(fields[2]))   
training = sc.textFile("...").map(parseRating).cache()
test = sc.textFile("...").map(parseRating)

# train a recommendation model
model = ALS.train(training, rank = 10, iterations = 5)

# make predictions on (user, product) pairs from the test data
predictions = model.predictAll(test.map(lambda x: (x[0], x[1])))
Oglasi
Ovaj unos je objavljen u Nekategorizirano. Bookmarkirajte stalnu vezu.

Komentiraj

Popunite niže tražene podatke ili kliknite na neku od ikona za prijavu:

WordPress.com Logo

Ovaj komentar pišete koristeći vaš WordPress.com račun. Odjava / Izmijeni )

Twitter picture

Ovaj komentar pišete koristeći vaš Twitter račun. Odjava / Izmijeni )

Facebook slika

Ovaj komentar pišete koristeći vaš Facebook račun. Odjava / Izmijeni )

Google+ photo

Ovaj komentar pišete koristeći vaš Google+ račun. Odjava / Izmijeni )

Spajanje na %s