published under license Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)copy! share!
posted in category Software Development & Programming / Ruby
posted at 21. Apr '18
Scrub Invalid UTF-8 Characters
#!/usr/bin/env ruby
# removes invalid UTF-8 characters with no replacement
# use case: zip file created in Windows with punctuation - UTF-16 or cp1250 vs. UTF-8 on Linux
require 'fileutils'
Dir['*'].each do |file|
scrubbed = file.encode('UTF-8', :invalid => :replace, :undef => :replace, :replace => '')
puts scrubbed.inspect
FileUtils.move(file, scrubbed) if file != scrubbed
end
Add Comment