Convert Spark Dense Vector to Mllib Vector

If you are trying to convert Spark Dataframe to Rdd of labeled point then you might run into a problem while converting feature vector of that dataframe to feature vector of Rdd.

When you use VectorAssembler from "org.apache.spark.ml.feature.VectorAssembler"  to create a feature vector you will be generating a dense vector of "ml" package in spark. So, if you try tp convert a rdd of a labelled point to dense vector, it will generate type mismatch error. Rdd of label point expects "vector from mllib package"

Solution:
  • row is a Row of a dataframe
{row=>LabeledPoint(row.getAs[Int]("label"),org.apache.spark.mllib.linalg.Vectors.fromML((row.getAs("features")))) }

Here, row.getAs("features") will give us org.apache.spark.mllib.linalg.DenseVector and using org.apache.spark.mllib.linalg.Vectors.fromML using we can convert that Dense vector to vectors of "mllib" package.





Comments

Popular posts from this blog

Speed Up Pandas .to_sql to Insert Data - 100x Faster ( Using SQLalchemy)

Real Time Twitter Sentiment Analysis Using Spark ,Twitter Streaming API and write to Elastic Serach

Visualize Real Time Twitter Sentiment Analysis Elastic Search and Kibana