Apache Spark is one of the fastest-growing big data projects in the history of the Apache Software Foundation.
With its memory-oriented architecture, flexible processing libraries and ease-of-use, Spark has emerged as a leading distributed computing framework for real-time analytics.
* Source MongoDB Whitepaper
With its memory-oriented architecture, flexible processing libraries and ease-of-use, Spark has emerged as a leading distributed computing framework for real-time analytics.
Here is
Comparing Spark Connectors for MongoDB and Hadoop
| MongoDB Connector for Hadoop |
Stratio Spark-MongoDB Connector |
|
| Machine Learning |
Yes | Yes |
| SQL |
Not currently |
Yes |
| DataFrames |
Not currently |
Yes |
| Streaming |
Not currently | Not currently |
| Python |
Yes |
Yes Using SparkSQL syntax |
| Use MongoDB secondary indexes to filter input data |
Yes |
Yes |
| Compatibility with MongoDB replica sets and sharding |
Yes |
Yes |
| MongoDB Support |
Yes Read and write |
Yes Read and writ |
| HDFS Support |
Yes Read and write |
Partial Write only |
| Support for MongoDB BSON Files |
Yes |
No |
| Commercial Support |
Yes With MongoDB Enterprise Advanced |
Yes Provided by Strati |
* Source MongoDB Whitepaper

Thanks!
ReplyDelete