Posts

Showing posts from September, 2017

Convert Spark Dense Vector to Mllib Vector

If you are trying to convert Spark Dataframe to Rdd of labeled point then you might run into a problem while converting feature vector of that dataframe to feature vector of Rdd. When you use VectorAssembler from "org.apache.spark.ml.feature.VectorAssembler"  to create a feature vector you will be generating a dense vector of "ml" package in spark. So, if you try tp convert a rdd of a labelled point to dense vector, it will generate type mismatch error. Rdd of label point expects "vector from mllib package" Solution: row is a Row of a dataframe {row=>LabeledPoint(row.getAs[Int]("label"),org.apache.spark.mllib.linalg.Vectors.fromML((row.getAs("features")))) } Here, row.getAs("features") will give us  org.apache.spark.mllib.linalg.DenseVector and using org.apache.spark.mllib.linalg.Vectors.fromML using we can convert that Dense vector to vectors of "mllib" package.