Tech Blog on Software Engineering

MongoDB string split on update with mongosh

mongodb split update

Nobody doubts that MongoDB can split string for the update function. There is a common Software Ops task when you need to transform a string field in a datastore to an array split by delimiter symbol or vice versa. It could be helpful when you transform a string of comma/space/semicolon-separated items like tags, aliases, and categories into an array or combine an array of items into a string. For example, when you have written a full name in space-separated string format. Then during system evolvement, you want to introduce a new field first_name. In PostgreSQL it could look like this:

ALTER TABLE users ADD COLUMN first_name text;
UPDATE users SET first_name = split_part(name, ' ', 1);

MongoDB update operators

Working in MongoDB realm you jump into mongosh console and starting to construct your update statement:

db.users.updateMany(
  {
    "name": {$exists: true}
  }, 
  {
    $set: {
      // Suddenly you find yourself without required function in the update operators list
      "first_name": {$SPLIT_OPERATOR_DOES_NOT_EXIST!??: ["$name", " "]} 
    }
  })

Yes, this is it, those split and string manipulation functions are not in the mongodb update operators list

But wait, you can say, there are $split, $concat and many other functions alike, I can just write it like this:

db.users.updateMany(
{
  "name": {$exists: true}
}, 
{
  $set: {
    "first_name": {
      $arrayElemAt: [{$split: ["$name", " "]}, 0]
    }
  }
})

Sure, you can but then you get the command in the value of the field:

db.users.findOne({"_id": ObjectId('some_id')})
{
  _id: ObjectId('some_id'),
  name: 'John Smith',

  // wait, what? command written as Object value!
  first_name: {'$arrayElemAt': [{'$split': ['$name', ' ']}, '0']} 
}

Update split is possible

Facilitation comes from the fact that the update is accepting aggregation pipelines of the aggregation framework. Its stages are performed as update actions. The order in which things are built in Mongo is surprising sometimes. Having two sets of acceptable operators for the update function is a little bit confusing. But let’s avoid stumbling on the slippery slope of the MongoDB API. 😉

Applying an aggregation pipeline you can rewrite it like so:

db.users.updateMany(
  {
    "name": {$exists: true}
  }, 
  [
    {
       $set: {
         "first_name": {
           $arrayElemAt: [{$split: ["$name", " "]}, 0]
         }
       }
    }
  ]
)

And the subtle difference is in additional square brackets, which transform the statement into an aggregation pipeline and it starts to make sense and work.

Also, you can completely rely on the aggregation framework with the capability to merge new values into the collection:

db.users.aggregate([
{
  $match: {
    "name": {$exists: true}
  }
},
{
  $project: {
    "first_name": {
      $first: {$split: ["$name", " "]} // simplification of $arrayElemAt 0 to $first 
    }
  }
},
{
  $merge: {
    into: "users",
    whenNotMatched: "discard"
  }
}
])

Summary

Yeah, the task is solved! You passed that obstacle of operator absence! But what about these two sets of operators for update? Well, it is a reflection of the system’s development. The system had one collection of update operators in the early stages and during evolvement included a more powerful set of aggregation framework capabilities. Also, it was built possibly keeping in mind the desire to exclude code duplication and apply reuse of aggregation framework codebase when extending features to other commands. I think small clutters are forgivable for the wonderful overall convenience of developer interactions with MongoDB.