[ffmpeg](https://ffmpeg.org/) is a swiss army knife for everything audio/video. It can do practically every task under the sun, and in fact powers most major dedicated "video players" (VLC, MPC-HC, built-in players in Chrome and Firefox...)[^1]
If you're on Windows, it's technically possible to install `ffmpeg` and use it directly [^2], but since the windows Command Prompt sucks ass comfort-wise and scripting-wise, it's recommended to just [install Ubuntu as part of the Windows Subsystem for Linux](https://docs.microsoft.com/en-us/windows/wsl/install-win10), and then `apt-get install ffmpeg`.
`ffmpeg` is pretty clever, it can correctly guess the codecs and reasonable default settings by the file extension, so all of the following will work as expected (and retain metadata[^3]!):
While it can reasonably assumed that `mp4` ≅ `h264`, `avi` is a bit more complex. You can list all the supported codecs with `ffmpeg -codecs`[^5], but since there's several hundreds, you better have an idea of what you want to do in the first place.
`ffmpeg` needs a list of images in a text file in a [specific format](https://trac.ffmpeg.org/wiki/Concatenate#demuxer) in order to convert them to a video. There's a couple ways to do this:
Where `FILE` is the video file, and `image%05d.png` is the format string for image filenames; this will create `image00001.png`, `image00002.png`, `image00123.png`, etc. (`%05d` means pad with `5` zeroes; `%010d` for padding with `10` zeroes...)
`-vframes 1` is the option that tells `ffmpeg` to just capture one (i.e. the first) frame of the video - in the case of streams, this means the latest one anyway.
`ffmpeg` also has a [rich set of filters](https://ffmpeg.org/ffmpeg-filters.html), two of which are of interest for us now:
- [mpdecimate](https://ffmpeg.org/ffmpeg-filters.html#mpdecimate) - *Drop frames that do not differ greatly from the previous frame in order to reduce frame rate.*
- [minterpolate](https://ffmpeg.org/ffmpeg-filters.html#minterpolate) - *Convert the video to specified frame rate using motion interpolation.*
The idea is that `mpdecimate` drops all near-duplicate frames, and `minterpolate` re-calculates them using non-duplicate frames that were left.
`mpdecimate`'s defaults are pretty okay, but the result may not look too good if the frame drops are frequent and long. I've had pretty good results using its `max` parameter which limits the amount of frames dropped in a single stretch of video, e.g. `-vf mpdecimate=max=15` which drops at most 15 frames (i.e. half a second assuming 30 FPS), meaning interpolation won't happen everywhere and the video will remain faithfully choppy.
`minterpolate`, on the other hand, defaults to semi-smart motion compensated interpolation, and that *might* just be what you want, but it generally gives pretty funky results. Fortunately, it also has a "blend" mode, which just averages the start and end frames and crossfades them, which gives much more agreeable outputs for simple frame drop situations. It is also generally much faster, I was getting near or above real-time speeds using "blend", whereas motion compensation dropped the processing speed to 0.01x.
h264 also has "profiles", basically [sets of features](https://en.wikipedia.org/wiki/Advanced_Video_Coding#Profiles) - and it turns out this can make the difference between a file working and not working on some crappy embedded media players, like TVs or pico projectors.
And apparently, some players are also sensitive to the pixel format[^7], i.e. can't handle anything else than YUV w/ 4:2:0 chroma subsampling, to fix this use the `-pix_fmt` option as follows:
No silver bullet, you'll just have to try different things for different devices. A database of crappy players and appropriate `ffmpeg` settings would be great.