Convert Spark Dense Vector to Mllib Vector
If you are trying to convert Spark Dataframe to Rdd of labeled point then you might run into a problem while converting feature vector of that dataframe to feature vector of Rdd.
When you use VectorAssembler from "org.apache.spark.ml.feature.VectorAssembler" to create a feature vector you will be generating a dense vector of "ml" package in spark. So, if you try tp convert a rdd of a labelled point to dense vector, it will generate type mismatch error. Rdd of label point expects "vector from mllib package"
Solution:
Here, row.getAs("features") will give us org.apache.spark.mllib.linalg.DenseVector and using org.apache.spark.mllib.linalg.Vectors.fromML using we can convert that Dense vector to vectors of "mllib" package.
When you use VectorAssembler from "org.apache.spark.ml.feature.VectorAssembler" to create a feature vector you will be generating a dense vector of "ml" package in spark. So, if you try tp convert a rdd of a labelled point to dense vector, it will generate type mismatch error. Rdd of label point expects "vector from mllib package"
Solution:
- row is a Row of a dataframe
{row=>LabeledPoint(row.getAs[Int]("label"),org.apache.spark.mllib.linalg.Vectors.fromML((row.getAs("features")))) }
Here, row.getAs("features") will give us org.apache.spark.mllib.linalg.DenseVector and using org.apache.spark.mllib.linalg.Vectors.fromML using we can convert that Dense vector to vectors of "mllib" package.
Comments
Post a Comment