Mind Dump, Tech And Life Blog
written by Ivan Alenko
published under license Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)copy! share!
posted at 21. Apr '18

Scrub Invalid UTF-8 Characters

#!/usr/bin/env ruby

# removes invalid UTF-8 characters with no replacement
# use case: zip file created in Windows with punctuation - UTF-16 or cp1250 vs. UTF-8 on Linux

require 'fileutils'

Dir['*'].each do |file|
  scrubbed = file.encode('UTF-8', :invalid => :replace, :undef => :replace, :replace => '')
  puts scrubbed.inspect
  FileUtils.move(file, scrubbed) if file != scrubbed
end

Add Comment