Apache Spark is one of the fastest-growing big data projects in the history of the Apache Software Foundation.
With its memory-oriented architecture, flexible processing libraries and ease-of-use, Spark has emerged as a leading distributed computing framework for real-time analytics.
* Source MongoDB Whitepaper
With its memory-oriented architecture, flexible processing libraries and ease-of-use, Spark has emerged as a leading distributed computing framework for real-time analytics.
Here is
Comparing Spark Connectors for MongoDB and Hadoop
MongoDB Connector for Hadoop |
Stratio Spark-MongoDB Connector |
|
Machine Learning |
Yes | Yes |
SQL |
Not currently |
Yes |
DataFrames |
Not currently |
Yes |
Streaming |
Not currently | Not currently |
Python |
Yes |
Yes Using SparkSQL syntax |
Use MongoDB secondary indexes to filter input data |
Yes |
Yes |
Compatibility with MongoDB replica sets and sharding |
Yes |
Yes |
MongoDB Support |
Yes Read and write |
Yes Read and writ |
HDFS Support |
Yes Read and write |
Partial Write only |
Support for MongoDB BSON Files |
Yes |
No |
Commercial Support |
Yes With MongoDB Enterprise Advanced |
Yes Provided by Strati |
* Source MongoDB Whitepaper
Thanks!
ReplyDelete