Find and replace (sed) inside multiple files and directories WITHOUT SSH (shell) access

In the interest of up front honesty, the title of this post is a little misleading. Instead of the Linux commands “find” and “sed”, I will describe a process using WinSCP and PowerShell. If your web host offers SSH/shell access, finding and replacing a word or phrase in many files and directories IS generally accomplished with sed (or find + sed). If your web hosting provider DOES NOT offer SSH access (only FTP/SFTP/FTPS) performing a recursive find and replace inside multiple files and directories requires a little more creativity.

The problem

I was tasked with migrating a website’s analytics solution from Webalizer to Google Analytics or StatCounter. Webalizer (and similar analytics products like GoAccess and AWStats) require no JavaScript code - they analyze webserver access logs retroactively to generate traffic reports based on the log contents.

Implementing Google Analytics or StatCounter requires adding tracking code (JavaScript) to each page of the site to be analyzed. In this case, there were thousands of htm/html pages and the total size of the site was around 15 GB (many images, videos, PDFs, etc.).

The site was hosted at FreeHostia (shared tenancy offering) so SSH access was not available.

WinSCP and PowerShell to the rescue!

Using WinSCP to synchronize only the htm/html files to update locally

WinSCP’s “Synchronize” functionality is fabulous and fit the bill for this task. I did not want to download all 15 GB of data (time and resource constraints) so opted to use WinSCP and only synchronize HTM and HTML files locally - this approach reduced the size of the data transfer to a few MBs instead of many GBs. The htm and html files were the only ones that needed the JS tracking code applied.

Main Synchronize settings:

Main synchronization options

  • Specified the Local and Remote directories accordingly
  • Changed “Direction” to Local
  • Changed “Mode” to Mirror files
  • Unchecked Delete files (for safety) and Preview changes (to save time) if desired (“Synchronize options”)
  • Selected “Transfer settings*

Advanced Transfer Settings:

Transfer settings/advanced options

  • Set “Transfer mode” to Text (plain text, html, scripts…) - I was only concerned with html/htm files
  • Unchecked Calculate total size - saved a little time
  • Selected Edit… in the “File Mask” section - this was an easier way to specify filters

File Mask settings:

Transfer settings/file and folder mask options

  • Added only the extensions needing the tracking code to Include files - I was unsure of case sensitivity and wanted to cover the bases
  • Added an exclusion to the Exclude directories list (_vti_cnf, etc.) - the site was maintained in FrontPage at some point and the usual _vti_XXXX directories were everywhere in the structure

Once configured, I initiated the “Synchronize” task in WinSCP and within a few minutes, had local copies of all the HTM and HTML files requiring updates.

Using PowerShell to clean up and perform the find/replace inside all files

I used a script similar to the following to perform my find and replace inside all downloaded HTM and HTML files.

# Delete empty folders/directories to tidy things up
# Thanks to @Bogdan Calmac, via https://stackoverflow.com/questions/1575493/how-to-delete-empty-subfolders-with-powershell/4234297

cd C:\Users\Neil\Desktop\Work
ls -recurse | where {!@(ls -force $_.fullname)} | rm


# Perform the actual find and replace against all objects in the directory structure
# Thanks to @Robben_Ford_Fan_boy and @Daniel Liuzzi, via https://stackoverflow.com/questions/2837785/powershell-script-to-find-and-replace-for-all-files-with-a-specific-extension

$htmlFiles = Get-ChildItem . -Recurse
foreach ($file in $htmlFiles) { (Get-Content $file.PSPath) | Foreach-Object { $_ -replace 'FIND', 'REPLACE' } | Set-Content $file.PSPath }

This ran in 15-20 seconds and accomplished the task flawlessly.

As a side note, my synchronization job in WinSCP ended up downloading several empty directories (likely a result of my settings, but I did not dig into it). The first lines of the PowerShell code cleaned things up by removing all empty directories from the local path.

The rest of the lines in the script perform the actual find/replace.

I opted to incorporate the JavaScript tracking code using an include (SSI). The actual JS code was placed in a separate file (i.e. trackingcode.inc) and the PowerShell script simply added an include statement before the closing tag on each page. This kept the find/replace more tidy and will allow for future analytics js code changes without updates to each htm/html page. The exact code looked like this:

foreach ($file in $htmlFiles) { (Get-Content $file.PSPath) | Foreach-Object { $_ -replace '</body>', '<!--#include virtual="/trackingcode.inc" --></body>' } | Set-Content $file.PSPath }

Using WinSCP to synchronize (reverse) the local changed files to the hosting provider

This step was basically the reverse of the original sync. I took care to ensure the Delete Files option was unchecked and that the mode was Mirror. Aside from that, I specified the paths accordingly and grabbed coffee while WinSCP uploaded the changed htm and html files (asymmetric DSL - much slower on upload).

WinSCP sync settings to push changes files back to web host

Final thoughts

I hope this ends up helping someone in a similar situation (i.e. another FreeHostia customer). I am sure there are other web hosting providers that do not offer SSH access on their shared tenancy servers as well.

If others have addressed this sort of requirement in a different way, I would be very interested to hear about that as well.