File I/O

Files and Streams

Input/Output To use files for input/output, you need to include <stdio.h>.


Simple file output examples:

1.  FILE *fp; /* For reading/writing to a file */
2.
3.  fp = fopen("myfile", "w");       /* Open the file for write */
4.    fputs("Line number 1\n", fp);  /* Write text to file      */
5.    fputs("Line number 2\n", fp);  /* Write more text to file */
6.    fputs("Line number 3\n", fp);  /* Write even more text    */
7.  fclose(fp);                      /* Close the file          */
Descriptions: Reading from a file and writing to a file generally follows these steps:
  1. Open the file (fopen)
  2.    Read/Write the file (fgetc, fgets, fputc, fputs, etc.)
  3. Close the file (fclose)
There is actually a very important piece missing from the code above. Corrected code:
 1.  FILE *fp; /* For reading/writing to a file */
 2.
 3.  fp = fopen("myfile", "w");       /* Open the file for write */
 4.  if (fp != NULL)                  /* Check for success/fail  */
 5.  {
 6.    fputs("Line number 1\n", fp);  /* Write text to file      */
 7.    fputs("Line number 2\n", fp);  /* Write more text to file */
 8.    fputs("Line number 3\n", fp);  /* Write even more text    */
 9.    fclose(fp);                    /* Close the file          */
10.  }
11.  else
12.    puts("Failed to open 'myfile' for write.");  /* Give some error message */
You must always check the return value from fopen. If it fails, a NULL pointer is returned.

Writing to a fileWriting to a file and the screen
FILE *fp;

fp = fopen("myfile.txt", "w");
if (fp)
{
  int i;
  for (i = 0; i < 10; i++)
    fprintf(fp, "Line number %i\n", i);
  fclose(fp);
}




Output (in the file):
Line number 0
Line number 1
Line number 2
Line number 3
Line number 4
Line number 5
Line number 6
Line number 7
Line number 8
Line number 9
FILE *fp;

fp = fopen("myfile.txt", "w");
if (fp)
{
  int i;
  for (i = 0; i < 10; i++)
  {
    fprintf(fp, "Line number %i\n", i); /* To file   */
    printf("Line number %i\n", i);      /* To screen */
  }
  fclose(fp);
}

Output (on screen and in the file):
Line number 0
Line number 1
Line number 2
Line number 3
Line number 4
Line number 5
Line number 6
Line number 7
Line number 8
Line number 9
What's wrong with the code below?

  /* Open the file for write */  
FILE *fp = fopen("myfile", "w");

if (fp == NULL)                  
{
  printf("Failed to open the file: myfile.txt\n");  /* Print error message */
}
else
{
  /* Write some stuff to the file */
}

fclose(fp);
Notes:

Note: You must always check the return from fopen to make sure the file was opened successfully. Also, you must always close the file when you're finished with it. Failure to do either of these tasks will certainly negatively affect your grade on assignments/labs.

Basic Input

The simplest facilities for unformatted input in C are getchar and gets:
int getchar(void);
char *gets(char *buffer);
Notes: File versions of the above:
int fgetc(FILE *stream);
char *fgets(char *string, int count, FILE *stream);
Notes: Simple input examples: Suppose we have a text file named poem.txt and it contains this:
Roses are red.<NL>
Violets are blue.<NL>
Some poems rhyme.<NL>
But not this one.<NL>
The <NL> indicates the (invisible) newline in the file. Looking at the file with the od program (octal dump) on Linux or Mac OS X:

mmead@sabrina:~/data/digipen/cs120> od -a poem.txt
0000000   R   o   s   e   s  sp   a   r   e  sp   r   e   d   .  nl   V
0000020   i   o   l   e   t   s  sp   a   r   e  sp   b   l   u   e   .
0000040  nl   S   o   m   e  sp   p   o   e   m   s  sp   r   h   y   m
0000060   e   .  nl   B   u   t  sp   n   o   t  sp   t   h   i   s  sp
0000100   o   n   e   .  nl
0000105
The same file under Windows:
E:\Data\Courses\Notes\CS120\Code\FileIO>od -a poem.txt
0000000   R   o   s   e   s  sp   a   r   e  sp   r   e   d   .  cr  nl
0000020   V   i   o   l   e   t   s  sp   a   r   e  sp   b   l   u   e
0000040   .  cr  nl   S   o   m   e  sp   p   o   e   m   s  sp   r   h
0000060   y   m   e   .  cr  nl   B   u   t  sp   n   o   t  sp   t   h
0000100   i   s  sp   o   n   e   .  cr  nl
0000111
This example program reads in the text file line-by-line and prints out each line to the screen:
#define BUFFER_SIZE 50    /* How big our buffer will be         */
char buffer[BUFFER_SIZE]; /* To hold each line from the file    */
FILE *fp;                 /* The pointer to manipulate the file */

fp = fopen("poem.txt", "r");        /* Try to open the file for reading */
if (fp)                             /* The file was opened successfully */
{
  while (!feof(fp))                 /* While not at the end of the file */
  {
    fgets(buffer, BUFFER_SIZE, fp); /* Read in the next line            */
    printf(buffer);                 /* Print it out on the screen       */
  }
  fclose(fp);                       /* Close the file                   */
}

Output:
Roses are red.
Violets are blue.
Some poems rhyme.
But not this one.
But not this one.
The corrected code:
void f5(void)
{
  #define BUFFER_SIZE 50    /* How big our buffer will be         */
  char buffer[BUFFER_SIZE]; /* To hold each line from the file    */
  FILE *fp;                 /* The pointer to manipulate the file */

  fp = fopen("poem.txt", "r");  /* Try to open the file for reading */
  if (fp)                       /* The file was opened successfully */
  {
    while (!feof(fp))           /* While not at the end of the file */
    {
      if (fgets(buffer, BUFFER_SIZE, fp)) /* If there is another line,  */
        printf(buffer);                   /* Print it out on the screen */
    }
    fclose(fp);                           /* Close the file             */
  }
}

Output:
Roses are red.
Violets are blue.
Some poems rhyme.
But not this one.

Another way to solve the problem:
fp = fopen("poem.txt", "r");  /* Try to open the file for reading */
if (fp)                       /* The file was opened successfully */
{
  while (fgets(buffer, BUFFER_SIZE, fp)) /* Read in the next line      */
    printf(buffer);                      /* Print it out on the screen */

  fclose(fp);                            /* Close the file             */
}

Note: If When you are reading in a text file and the last line in the file is duplicated (as above), remember these techniques so you can fix your code.

Default open streams: They are part of the standard I/O library in stdio.h so you don't need to declare them.

You will probably never have to deal with the internal structure of a FILE and can just assume that the standard I/O devices are declared like this:

FILE *stdin;
FILE *stdout;
FILE *stderr;

The definition of a FILE used by Microsoft's compiler:

struct _iobuf {
        char *_ptr;
        int   _cnt;
        char *_base;
        int   _flag;
        int   _file;
        int   _charbuf;
        int   _bufsiz;
        char *_tmpfname;
        };
typedef struct _iobuf FILE;

_CRTIMP extern FILE _iob[];

#define stdin  (&_iob[0])
#define stdout (&_iob[1])
#define stderr (&_iob[2])

The example program above modified to write to stdout. Don't have to open it, it's always open:
CodeOutput to screen/stdout
int i;
for (i = 0; i < 10; i++)
  fprintf(stdout, "Line number %i\n", i);
Line number 0
Line number 1
Line number 2
Line number 3
Line number 4
Line number 5
Line number 6
Line number 7
Line number 8
Line number 9

In fact, these two lines are essentially equivalent. (printf probably just calls fprintf, which could just call sprintf then puts, which could just call fwrite, which probably calls write, which calls... You get the idea.)

printf("Line number %i\n", i);          /* Write to stdout/screen */
fprintf(stdout, "Line number %i\n", i); /* Write to stdout/screen */
Notes:

Redirecting stdout and stderr

Suppose we had a function that did this:

void ShowOutErr(void)
{
  fputs("This is going to stdout\n", stdout);
  fputs("This is going to stderr\n", stderr);
}
By default, running this program would produce this output on the console (display):
This is going to stdout
This is going to stderr
We can redirect these messages to a file by using the output redirection operator at the console. (Assume the function above compiles to a program called myprog.)
myprog > out.txt
When we run the program, we see this printed to the screen:
This is going to stderr
What happened to the line "This is going to stdout"? Well, it was redirected to a file named out.txt. If we look at the contents of this file, using the cat command:
cat out.txt
we see:
This is going to stdout
To redirect stderr, we need to do this:
myprog 2> err.txt
This produces the output:
This is going to stdout
The redirection also created a file named err.txt that contains the other line of text.

To redirect both, we do this:

myprog > out.txt 2> err.txt
which produces no output on the screen. Both lines of text have been redirected to their respective files (out.txt and err.txt).

If we want both stdout and stderr redirected to the same file (both.txt), we would do this:

myprog > both.txt 2>&1
The syntax: 2>&1 means that stderr should be redirected to stdout so they are both going to the same place. There is also a shorter way that some shells will accept:
myprog &> both.txt
Be careful not to do it this way:
myprog > both.txt 2> both.txt      # This is wrong!
You may get a corrupted file or something else. On Windows, this is the error you get:
The process cannot access the file because it is being used by another process.

Notes:

More Details for Input/Output

Details on fopen. From the link:

Text files are files containing sequences of lines of text. Depending on the environment where the application runs, some special character conversion may occur in input/output operations in text mode to adapt them to a system-specific text file format. Although on some environments no conversions occur and both text files and binary files are treated the same way, using the appropriate mode improves portability.

Text streams have certain attributes that may vary among different systems: There are two cases where the text vs. binary mode makes a difference. The first one is mentioned above: The line endings. The second case is how some systems deal with the EOF (End Of File) character in a file. This character is ASCII 26 (Ox1A). Some systems, when they encounter the EOF character in a file opened as text, will simply stop reading the file as if the end has been reached.

Let's look at a couple of examples. There are two different files below. One has line-feed endings (macOS X and Linux), and the other has carriage-return line-feed endings (Windows).

poem-lf.txt (69 bytes)     poem-crlf.txt (73 bytes)

You won't see any difference between the files when viewing with a browser. You must download them to your computer to see the differences. Here are the hex dumps:

poem-lf.txt:
       00 01 02 03 04 05 06 07  08 09 0A 0B 0C 0D 0E 0F
--------------------------------------------------------------------------
000000 52 6F 73 65 73 20 61 72  65 20 72 65 64 2E 0A 56   Roses are red..V
000010 69 6F 6C 65 74 73 20 61  72 65 20 62 6C 75 65 2E   iolets are blue.
000020 0A 53 6F 6D 65 20 70 6F  65 6D 73 20 72 68 79 6D   .Some poems rhym
000030 65 2E 0A 42 75 74 20 6E  6F 74 20 74 68 69 73 20   e..But not this 
000040 6F 6E 65 2E 0A                                     one..
poem-crlf.txt:
       00 01 02 03 04 05 06 07  08 09 0A 0B 0C 0D 0E 0F
--------------------------------------------------------------------------
000000 52 6F 73 65 73 20 61 72  65 20 72 65 64 2E 0D 0A   Roses are red...
000010 56 69 6F 6C 65 74 73 20  61 72 65 20 62 6C 75 65   Violets are blue
000020 2E 0D 0A 53 6F 6D 65 20  70 6F 65 6D 73 20 72 68   ...Some poems rh
000030 79 6D 65 2E 0D 0A 42 75  74 20 6E 6F 74 20 74 68   yme...But not th
000040 69 73 20 6F 6E 65 2E 0D  0A                        is one...

Here's a simple program that will read a given file with a given mode (binvstxt.c):

#include <stdio.h>  /* printf, fopen, fclose, fgets */
#include <string.h> /* strlen                       */

int main(int argc, char **argv)
{
  FILE *fp;         /* File pointer to read */
  char buffer[100]; /* Maximum line length  */

    /* Must be given a filename and mode on the command line */
  if (argc < 3)
  {
    printf("Must specify file to read and mode (r/rb/rt)\n");
    return 1;
  }

    /* Try to open the file in the given mode */
  fp = fopen(argv[1], argv[2]);
  if (!fp)
  {
    printf("Can't open %s for %s\n", argv[1], argv[2]);
    return 2;
  }

    /* Read each line in the file and print it out with it's length */
  while (!feof(fp))
  {
    if (fgets(buffer, 100, fp))
    {
      printf("%s", buffer);
      printf("Length is %li\n", strlen(buffer));
    }
  }

    /* Done reading, clean up */
  fclose(fp);

  return 0;
}
Reading the files under Linux and macOS X using the specified mode:
ModeOutput
fp = fopen("poem-lf.txt", "r");
fp = fopen("poem-lf.txt", "rb");
Roses are red.
Length is 15
Violets are blue.
Length is 18
Some poems rhyme.
Length is 18
But not this one.
Length is 18
fp = fopen("poem-crlf.txt", "r");
fp = fopen("poem-crlf.txt", "rb");
Roses are red.
Length is 16
Violets are blue.
Length is 19
Some poems rhyme.
Length is 19
But not this one.
Length is 19
Reading the files under Windows using the specified mode:
ModeOutput
fp = fopen("poem-lf.txt", "r");
fp = fopen("poem-lf.txt", "rb");
Roses are red.
Length is 15
Violets are blue.
Length is 18
Some poems rhyme.
Length is 18
But not this one.
Length is 18
fp = fopen("poem-crlf.txt", "r");
Roses are red.
Length is 15
Violets are blue.
Length is 18
Some poems rhyme.
Length is 18
But not this one.
Length is 18
fp = fopen("poem-crlf.txt", "rb");
Roses are red.
Length is 16
Violets are blue.
Length is 19
Some poems rhyme.
Length is 19
But not this one.
Length is 19
This "text" file has an embedded EOF character (ASCII 26, hex 0x1A). It replaces the newline character at the end of the first line: eof.txt. Here's the hex dump:
eof.txt:
       00 01 02 03 04 05 06 07  08 09 0A 0B 0C 0D 0E 0F
--------------------------------------------------------------------------
000000 52 6F 73 65 73 20 61 72  65 20 72 65 64 2E 1A 56   Roses are red..V
000010 69 6F 6C 65 74 73 20 61  72 65 20 62 6C 75 65 2E   iolets are blue.
000020 0A 53 6F 6D 65 20 70 6F  65 6D 73 20 72 68 79 6D   .Some poems rhym
000030 65 2E 0A 42 75 74 20 6E  6F 74 20 74 68 69 73 20   e..But not this 
000040 6F 6E 65 2E 0A                                     one..
On Linux and macOS using the cat command:
cat eof.txt

Roses are red.Violets are blue.
Some poems rhyme.
But not this one.
It may not show correctly because these "unprintable" characters vary from system to system. Here's an actual screenshot of what the output looks like on macOS X or Linux:

Running the equivalent of the cat command on Windows (the type command)
type eof.txt

Roses are red.
This clearly shows that Windows treats files opened in text differently than files opened in binary mode. From the man page for fopen:

The mode string can also include the letter 'b' either as a last character or as a character between the characters in any of the two- character strings described above. This is strictly for compatibility with C89 and has no effect; the 'b' is ignored on all POSIX conforming systems, including Linux. (Other systems may treat text files and binary files differently, and adding the 'b' may be a good idea if you do I/O to a binary file and expect that your program may be ported to non-UNIX environments.)

The Other systems... referered to is Windows.

Tip: Keep this difference in mind when reading binary files on Windows. If you are attempting to read from a file and the file reports that there is no more data, but you know there should be more, this may be the problem. This is a classic tell-tale sign that you're reading a binary file in text mode. I've lost count of how many times I've seen students do this.

The Microsoft C Bible goes on to say this:

[The letter 'b'] opens file in untranslated or binary mode. Every character in the file is read as is without the changes described below.

[The letter 't'] opens file in traslated mode. This is a Microsoft C extension and not an ANSI standard mode. It's purpose is to accommodate MS-DOS file conventions. In this mode, the following interpretations will be in effect: (1) Carriage Return-Line Feed (CR-LF) combinations on input are translated to single linefeeds. During output, single linefeed characters are translated to CR-LF pairs. (2) During input, the Control-Z character is interpreted as the end-of-file character.

Binary files have no restrictions or limitations. It is up to the programmer to decide when to interpret a file as a text file, and when to interpret it as a binary file.

Like most languages, reading from a file and writing to a file follow these steps:

  1. Open the file (fopen)
  2. Read/Write the file (fgetc, fgets, fputc, fputs, fprintf, fscanf, etc.)
  3. Close the file (fclose)
These two functions are required in all cases:
FILE *fopen(const char *filename, const char *mode);
int fclose(FILE *stream);
Notes: Here's a partial page from ANSI C Standard (7.9.5.3) regarding fopen
Description

       The fopen function opens the file whose name is the string pointed to by filename and associates a stream with it.

       The argument mode points to a string beginning with one of the following sequences:1

ropen text file for reading
wtruncate to zero length or create text file for writing
aappend; open or create text file for writing at end-of-file
rbopen binary file for reading
wbtruncate to zero length or create binary file for writing
abappend; open or create binary file for writing at end-of-file
r+open text file for update (reading and writing)
w+truncate to zero length or create text file for update
a+append; open or create text file for update, writing at end-of-file
r+b or rb+open binary file for update (reading and writing)
w+b or wb+truncate to zero length or create binary file for update
a+b or ab+append; open or create binary file for update, writing at end-of-file

       Opening a file with read mode ('r' as the first character in the mode argument) fails if the file does not exist or cannot be read.
__________________
1 Additional characters may follow these sequences.

From the textbook:

Q: I've seen programs that call fopen and put the letter t in the mode string. What does it mean?

A: The C Standard allows additional characters to appear in the mode string, provided that they follow r, w, a, b, or +. Some compilers allow the use of t to indicate that a file is to be opened in text mode instead of binary mode. Of course, text mode is the default anyway, so t adds nothing. Whenever possible, it's best to avoid using t and other nonportable features.

Example

This example displays the contents of a given text file. (We're assuming we can interpret it as text.)

From 64-bit Linux: stdio.h  bits/stdio_lim.h

#define MAX_LINE 1024

void DisplayTextFile(void)
{
  char filename[FILENAME_MAX]; /* Name of the file on disk. */
  FILE *infile;                /* The opened file pointer.  */

    /* Prompt the user for a filename */
  printf("Enter the filename to display: ");

    /* Get the user's input */
  fgets(filename, FILENAME_MAX, stdin);

    /* Remove newline IMPORTANT! */
  filename[strlen(filename) - 1] = 0;

    /* Open the text file */
  infile = fopen(filename, "r");

    /* If successful, read each line and display it */
  if (infile)
  {
    char buffer[MAX_LINE]; /* Store lines in this array. */

      /* Until we reach the end of the file */
    while (!feof(infile))
    {
        /* Get a line and display it */
      if (fgets(buffer, MAX_LINE, infile))
        fputs(buffer, stdout);
    }

      /* Close the file */
    fclose(infile);
  }
  else
  {
      /* Couldn't open the file */
    perror(filename); 
    printf("Error number: %i\n", errno);
  }
}
If the fopen fails, we call perror to display the reason. If we try to open a non-existent file named foo.txt, we'll see this:
foo.txt: No such file or directory
Error number: 2
Notes:

The printf Family

The most common function for displaying output on the screen is the printf function. It is kind of a Swiss Army Knife for outputting.
int printf( const char *format [, argument]... );
All format specifications are constructed as such:
%[flags] [width] [.precision] [{h | l | L}]type
Simple example:
void TestPrintf(void)
{
  int i = 227;
  long a = 123L;
  char c = 'M';
  double d = 3.1415926;
  char *s = "Digipen";

  printf("i=%i, i=%x, i=%X, a=%li, c=%c, c=%4i, d=%5.3f, s=%10s\n", 
         i, i, i, a, c, c, d, s);
}
Output:
i=227, i=e3, i=E3, a=123, c=M, c=  77, d=3.142, s=   Digipen
Notes

Exercise: Write a function that displays the following table using printf in a loop:

Value   Squared    Sqrt
-----------------------
   0         0    0.000
  10       100    3.162
  20       400    4.472
  30       900    5.477
  40      1600    6.325
  50      2500    7.071
  60      3600    7.746
  70      4900    8.367
  80      6400    8.944
  90      8100    9.487
 100     10000   10.000
 110     12100   10.488
 120     14400   10.954
 130     16900   11.402
The family of printf functions:
int printf( const char *format [, argument]... );
int fprintf( FILE *stream, const char *format [, argument ]...);
int sprintf( char *buffer, const char *format [, argument] ... );
Notes:

The scanf Family

scanf is analogous to printf, only used for input instead of output. The family of functions are these:
int scanf( const char *format [,argument]... );
int fscanf( FILE *stream, const char *format [, argument ]... );
int sscanf( const char *buffer, const char *format [, argument ] ... );
Notes: All format specifications are constructed as such:
%[*] [width] [{h | l | L}]type
Example:
void Testscanf(void)
{
  int i;
  char c;
  float f;
  double d;
  char s[20];

  scanf("%i %c %f %lf %s", &i, &c, &f, &d, s);
  printf("i=%i, c=%c, f=%f, d=%f, s=%s\n", i, c, f, d, s);
}
Given this input:
123 Z 4.56 7.8 Seattle
both functions display:
i=123, c=Z, f=4.560000, d=7.800000, s=Seattle
Because whitespace is ignored, the input could have been on separate lines:
123 
Z 
4.56 
7.8 
Seattle
Notes:

Binary Input/Output

For binary I/O, use fread and fwrite:
size_t fread(void *buffer, size_t size, size_t count, FILE *stream);
size_t fwrite(const void *buffer, size_t size, size_t count, FILE *stream);
Info: Note that the return value is not the number of bytes written, but the number of elements written. The number of actual bytes written will be the number of elements written multiplied by the size of each element. Examples using fread and fwrite. (You can ignore the structures for now.)

Contents of a file after writing 5 integers to the file (from previous example):

Big endian
000000  12 34 56 78 12 34 56 79 12 34 56 7A 12 34 56 7B  4Vx4Vy4Vz4V{
000010  12 34 56 7C                                      4V|
Little endian
000000  78 56 34 12 79 56 34 12 7A 56 34 12 7B 56 34 12  xV4yV4zV4{V4
000010  7C 56 34 12                                      |V4
Compare those files to a text file containing those integers as strings.
12345678<NL>
12345679<NL>
1234567A<NL>
1234567B<NL>
1234567C<NL>
The binary files contain 20 bytes of data. The text file contains 45 or 50 (depending on the system).

Self-check - Write a function called copyfile that makes an exact copy of a file on the disk. This is how the copy program in Windows or the cp program in Linux/Mac works.

Endian Refresher

More technically:

Some processors and their "endianess":

Processor Family           Endian
---------------------------------
Pentium (Intel)            Little
Athlon (AMD)               Little
680x0 (Motorola)            Big
Java Virtual Machine        Big
Alpha (DEC/Compaq)          Both
IA-64 (Intel)               Both
PowerPC (Motorola & IBM)    Both
SPARC (Sun)                 Both
MIPS (SGI)                  Both
ARM                         Both
Network issues: Macros to swap 16-bit and 32-bit values:

#ifdef BIG_ENDIAN /* no-ops */

  #define htons(x) (x)  /* host-to-network short (16-bit) */
  #define htonl(x) (x)  /* host-to-network long (32-bit)  */
  #define ntohs(x) (x)  /* network-to-host short (16-bit) */
  #define ntohl(x) (x)  /* network-to-host long (32-bit)  */

#else /* assume little endian and swap bytes */

  #define htons(x) ((((x) & 0xff00) >> 8) | ((x) & 0x00ff) << 8)) 
  #define htonl(x) ((((x) & 0xff000000) >> 24) | (((x) & 0x00ff0000) >> 8) | \
                   (((x) & 0x0000ff00) << 8) | (((x) & 0x000000ff) << 24))
  #define ntohs htons
  #define ntohl htonl

#endif
On some systems, you can include byteswap.h which contains macros to swap bytes.

More on Endianness.

Other Input/Output Functions

int ungetc( int c, FILE *stream );
int fflush( FILE *stream );
long ftell( FILE *stream );
int fseek( FILE *stream, long offset, int origin );
int feof( FILE *stream );
int rename( const char *oldname, const char *newname );
int remove( const char *path );
FILE *tmpfile(void);
Description: . Notes: