Introduction to File MIME Types
Last modified: May 8, 2020
1. Overview
The MIME type is an important topic in web technology.
In this tutorial, we are going to discuss what a MIME type is and learn how to get the MIME type of a file using Linux command-line utilities.
2. MIME Types
The abbreviation MIME stands for Multi-purpose Internet Mail Extensions. MIME types form a standard way of classifying file types on the internet.
First, let’s have a look at a common MIME type for an example:
A MIME type consists of two parts: a type and a subtype.
In this example, the type is “text“, and the subtype is “html“.
Currently, there are ten registered types: application, audio, example, font, image, message, model, multipart, text, and video.
Let’s see some other common MIME types:
In MIME types, the type and subtype are case-insensitive.
A subtype usually consists of a media format, such as “xml” or “pdf” in the above example. However, it can contain other content as well, such as a tree prefix or suffix, depending on the different rules in registration trees.
A complete MIME type format looks like:
Let’s see another MIME type example:
This is an API-specific MIME type, and it refers to JSON API.
In this example, we have “application” as the type and “api” as the subtype. The “vnd.” is the vendor prefix while the “+json” is the suffix, indicating that it can be parsed as JSON.
3. Determine the MIME Type of a File
The MIME type provides a standard way to name a type. However, the MIME type of a file is not stored on the Linux filesystem.
There are two ways to determine the MIME type of a file:
- Looking at the file extension
- Looking at the file content
Next, let’s take a look at two ways to determine the MIME type of a file.
3.1. By File Extension
A MIME type can sometimes be determined by the extension, but not always.
If a file doesn’t have an extension or has an incorrect extension, we cannot determine the MIME type by the file extension. For example, we can rename a JPG image file so that it has a ZIP file extension.
3.2. By File Content
Another way to get the MIME type of a file is by reading its content.
We can determine the MIME type according to specific characteristics of the file content. For example, a JPG starts with the hex signature FF D8 and ends with FF D9.
This is slower than the file extension approach due to the extra I/O efforts. However, it can be more reliable.
3.3. Combining the Two Ways
In the real world, programs often use a combination of the two ways to determine the MIME type of a file. For example, the shared-mime-info by freedesktop.org maintains a MIME-type database and allows other programs, such as GNOME, KDE, and Xfce, to use this database to find the corresponding MIME types by file extensions or contents.
Let’s see an example of the MIME type “image/png“defined in shared-mime-info:
In the above “image/png” example, the tag defines the rule to recognize PNG files by their contents. However, the tag defines the file extensions to determine the MIME type.
4. Linux Command-Line Tools
Now, let’s see how we can get the MIME type of a file using Linux command-line tools. In this section, we’ll see two utilities: the file command and the xdg-mime command.
4.1. The xdg-mime Command
The xdg-mime command is a member of the xdg-utils package from freedesktop.org. This package is preinstalled in almost all Linux distros with a desktop environment.
The xdg-mime command uses the shared-mime-info database to determine MIME types. It will first try to recognize the MIME type by file extension. If it fails, it will look at the content of the file.
The syntax of using the xdg-mime command to get the MIME type of a file is:
Let’s prepare a JPG image file (onePicture.jpg) and see if the xdg-mime command can get the MIME type:
Next, let’s play a little trick with the xdg-mime command. Let’s change the file extension and see what result the xdg-mime command will give us:
Oops! The xdg-mime command tells us a wrong MIME type. This is because the xdg-mime command first attempts to find a MIME type by file extension in the database.
Now, let’s remove the file extension entirely and see what happens:
We get the correct result again. This is because if the xdg-mime command cannot find a MIME type by file extension, it will then try to find the MIME type by the file content.
4.2. The file Command
Most free operating systems, such as FreeBSD and Linux, ship with the file command by default. We’ll use the command with the option –mime-type to get the MIME type of a file.
Let’s see if the file command can get the MIME type of the same JPG file:
Now, let’s do the same change on the file extension and see if the file can still report the right result:
Great! Even if we try to trick the file command by changing the file extension, it can still tell the correct MIME type. This is because the file command doesn’t rely on file extensions to determine file MIME types. Instead, it looks at the actual file contents. Therefore, it is more reliable in this case.
Finally, we delete the file extension and hope the file command can still work correctly:
As we expected, it gives the right result again.
5. Conclusion
In this article, we talked about what is a MIME type and how a MIME type is named. Then, we discussed the common approaches to determine the MIME type of a file in Linux.
Finally, we learned two Linux commands to get the MIME type of a file: the file and xdg-mime commands. Through some examples, we discussed why the two commands could behave differently on the same file.
How do you get the icon, MIME type, and application associated with a file in the Linux Desktop?
Using C++ on the Linux desktop, what is the best way to get the icon, the document description and the application «associated» with an arbitrary file/file path?
I’d like to use the most «canonical» way to find icons, mime-type/file type descriptions and associated applications on both KDE and gnome and I’d like to avoid any «shelling out» to the command line and «low-level» routines as well as avoiding re-inventing the wheel myself (no parsing the mime-types file and such).
Edits and Notes:
Hey, I originally asked this question about the QT file info object and the answer that «there is no clear answer» seems to be correct as far as it goes. BUT this is such a screwed-up situation that I am opening the question looking for more information.
I don’t care about QT in particular any more, I’m just looking for the most cannonical way to find the mime type via C++/c function calls on both KDE and gnome (especially Gnome, since that’s where things confuse me most). I want to be able show icons and descriptions matching Nautilus in Gnome and Konquerer/whatever on KDE as well as opening files appropriately, etc.
I suppose it’s OK that I get this separately for KDE and Gnome. The big question is what’s the most common/best/cannonical way to get all this information for the Linux desktop? Gnome documentation is especially opaque. gnome-vsf has mime routines but it’s deprecated and I can’t find a mime routine for GIO/GFS, gnome-vsf’s replacement. There’s a vague implication that one should use the open desktop applications but which one to use is obscure. And where does libmagic and xdg fit in?
Pointers to an essay summarizing the issues gladly accepted. Again, I know the three line answer is «no such animal» but I’m looking for the long answer.
How do I determine the MIME type of a file?
I recently switched to GNOME 3 and was surprised to find that there was no way by which we could set the default applications. I am writing a Nautilus extension using Bash. I want to find the MIME type of a file.
Initially I used to extract the extension of the file using sed and build the code on it. Then I realized that there is this command called file . When I try to find the mime-type of a mkv file, the command file —mime-type -b outputs application/octet-stream but when I see the Nautilus properties window it shows the correct video/x-matroska mime-type.
Am I missing anything here? If not is there a better way in which I can file the mime-type of a file?
2 Answers 2
There are different ways to get a MIME type on Linux, and they often lead to different results. Use
to get the same MIME type as Nautilus gets.
mimetype(1) — Linux man page
mimetype — Determine file type
Synopsis
mimetype [options] [-] files
Description
This script tries to determine the mime type of a file using the Shared MIME-info database. It is intended as a kind of file(1) work-alike, but uses mimetypes instead of descriptions.
If one symlinks the file command to mimetype it will behave a little more compatible, see «—file-compat». Commandline options to specify alternative magic files are not implemented the same because of the conflicting data formats. Also the wording of the descriptions will differ.
For naming switches I followed the manpage of file(1) version 4.02 when possible. They seem to differ completely from the spec in the ‘utilities’ chapter of IEEE Std 1003.1-2001 ( POSIX ).
Options
-a, —all Show output of all rules that match the file.
TODO: this method now just returns one match for each method (globs, magic, etc.). -b, —brief Do not prepend filenames to output lines (brief mode). —database=mimedir:mimedir. Force the program to look in these directories for the shared mime-info database. The directories specified by the basedir specification are ignored. -d, —describe Print file descriptions instead of mime types, this is the default when using «—file-compat». -D, —debug Print debug information about how the mimetype was determined. -f namefile, —namefile=namefile Read the names of the files to be examined from the file ‘namefile’ (one per line) before the argument list. —file-compat Make mimetype behave a little more file(1) compatible. This is turned on automaticly when you call mimetype by a link called ‘file’.
A single ‘-‘ won’t be considered a seperator between options and filenames anymore, but becomes identical to «—stdin». ( You can still use ‘—‘ as seperator, but that is not backward compatible with the original file command. ) Also the default becomes to print descriptions instead of mimetypes. -F string, —separator=string Use string as custom separator between the file name and its mimetype or description, defaults to ‘:’ . -h, —help -u, —usage Print a help message and exits. -i, —mimetype Use mime types, opposite to «—describe», this is the default when _not_ using «—file-compat». -L, —dereference Follow symbolic links. -l code, —language=code The language attribute specifies a two letter language code, this makes descriptions being outputted in the specified language. -M, —magic-only Do not check for extensions, globs or inode type, only look at the content of the file. This is particularly useful if for some reason you don’t trust the name or the extension a file has. -N, —noalign Do not align output fields. —output-format If you want an alternative output format, you can specify a format string containing the following escapes: Alignment is not available when using this, you need to post-process the output to do that. —stdin Determine type of content from STDIN , less powerfull then normal file checking because it only uses magic typing. This will happen also if the STDIN filehandle is a pipe.
To use this option IO::Scalar needs to be installed. -v, —version Print the version of the program and exit.
Environment
XDG_DATA_HOME XDG_DATA_DIRS These variables can list base directories to search for data files. The shared mime-info will be expected in the «mime» sub directory of one of these directories. If these are not set, there will be searched for the following directories: See also the » XDG Base Directory Specification»
Files
The base dir for all data files is determined by two environment variables, see » ENVIRONMENT «. BASE/mime/packages/SOURCE.xml All other files are compiled from these source files. To re-compile them use update-mime-database(1). BASE/mime/globs Compiled information about globs. BASE/mime/magic Compiled information about magic numbers. BASE/mime/MEDIA/SUBTYPE.xml Descriptions of a mimetype in multiple languages, used for the «—describe» switch.
Diagnostics
If a file has an empty mimetype or an empty description, most probably the file doesn’t exist and the given name doesn’t match any globs. An empty description can also mean that there is no description available in the language you specified.
The program exits with a non-zero exit value if either the commandline arguments failed, a module it depends on wasn’t found or the shared mime-info database wasn’t accesable. See File::MimeInfo for more details.
The ‘—all’ switch doesn’t really show all matches, but only one per mime-typing method. This needs to be implemnted in the modules first.
No known bugs, please mail the author if you find one.
mimetype doesn’t provide a switch for looking inside compressed files because it seems to me that this can only be done by un-compressing the file, something that defeats the purpose. On the other hand the option should exist for strict compatibility with file(1). Possibly a subclass should be made for this one day.
Author
Copyright
Copyright В© 2003,2008 Jaap G Karssenberg. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY ; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE .