Sitecore MediaIndexing Folder

On one of a Production CM servers we started getting frequent alerts about high disk space utilization and the drive on which Sitecore was installed getting full. With help of DevOps team yes, we configured a threshold value of disk space availability after which these alerts will be triggered. Allowing us to work proactively to make sure disk space do not get over which in turn might stop Sitecore instance working.

Some of the best practices like cleaning up logs files, residual sitecore package installation were already getting cleared using cleanup agent then as to why these alerts? Triaging further as to which files would be taking up such a huge amount of disk space nearly 143 GBs it was discovered Websites\App_Data\mediaIndexing is the one occupying such a large size.

It was known to us that from Sitecore 9+ versions media indexing for pdfs and other formats is now inbuilt into sitecore and hence we see a new folder MediaIndexing. Now is where our curiosity started building up with questions like,

Q1) Is the content in the folder being used by Sitecore?

Q2) What happens if we empty this folder completely as to what would the impact on CM server and sitecore functioning?

Q3) Why is it not getting cleared by itself?

So here’s what it can be done with mediaindexing folder on Sitecore CM Servers.

  • The mediaIndexing folder is used as a temporary storage for the media files associated with the media items.
  • These files are created to be used by iFilters and extract text content from the files.
  • The files should be automatically removed once the content is retrieved.
  • It is safe to manually clean up this folder. Necessary files will be re-created by Sitecore if needed.

We created a config patch “Cleanup-MeidaIndexing.config” file so that mediaindexing is cleaned up every 30 days, here’s reference of patching in case needed to put into any sitecore implementation. As this was needed only on CM server config file patch takes care of it as we have placed role:require=”ContentManagement”

  • Lastly MediaIndexing can be disabled if not required, well this would be the last case someone would like to go with.

It was couple of us working together to troubleshoot and find solution of this problem. Bhavesh Rana is the one whom I paired up to get this working back and resolved.

Sharing the problem and solution via a blog posts to allow someone save time what we spent to fix this.