  Dec 21 2014 7:53AM     Micheal
  0 Comments    11951 Views  
Here Micheal provided a short tutorial with example, how to Read PDF and Convert to Stream using C#/VB.
For that, we have to use a DLL called iTextSharp. Click Here to download iTextSharp DLL file.
Using iTextSharp DLL, we can read the PDF text in efficient manner.
Adding dll to the Project
Steps: Image given below show Add Reference option
  1. Right click the project
  2. Select Add Reference from the options
  3. From the Add Reference PopUp window, Select Browse tab and Select iTextSharp dll file
  4. then, Click Ok. The iTextSharp dll file can be loaded to the project
You will need to import the following namespace.

using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
using System.IO;


Imports iTextSharp.text.pdf
Imports iTextSharp.text.pdf.parser
Imports System.IO

In Page load Event, ReadPdfFile() method is called with parameter file path.

protected void Page_Load(object sender, EventArgs e)

            string PdfData = ReadPdfFile(@"C:\Test.pdf");

            byte[] pdfdata = ConvertStringToByte(PdfData);  
            Stream stream = new MemoryStream(pdfdata);

            //VICE VERSA ** Stream To Text **
            if (stream != null)
                // STREAM TO BYTE ARRAY
                Stream InputStream = stream;
                byte[] result;
                using (var streamReader = new MemoryStream())
                    result = streamReader.ToArray();
                //BYTE ARRAY TO STRING
                string strPdfText = ConvertByteArrayToString(result);


Protected  Sub Page_Load(ByVal sender As ObjectByVal e As EventArgs)
            Dim PdfData As String =  ReadPdfFile("C:\Test.pdf"
            Dim pdfdata() As Byte =  ConvertStringToByte(PdfData) 
            Dim stream  Stream =  New (pdfdata) 
            'VICE VERSA ** Stream To Text **
            If Not stream Is Nothing Then
                ' STREAM TO BYTE ARRAY
                Dim InputStream As Stream =  stream 
                Dim result() As Byte
                Imports ( streamReader = New ())
                    result = streamReader.ToArray()
                'BYTE ARRAY TO STRING
                Dim strPdfText As String =  ConvertByteArrayToString(result) 
            End If
End Sub

Code for Reading PDF file
Below is the sample code for Reading PDF file in ASP.Net C#/VB.
ReadPdfFile() method takes the parameter, file name and Reads the content finally PDF Document text will return as String.

public string ReadPdfFile(string fileName)
            string currentText = "";
            StringBuilder PDFText = new StringBuilder();
            PdfReader pdfReader = new PdfReader(fileName);

            for (int page = 1; page <= pdfReader.NumberOfPages; page++)
                ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
                currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);

                currentText = Encoding.UTF8.GetString(Encoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.UTF8.GetBytes(currentText)));
                PDFText = PDFText.Append(currentText);
            return PDFText.ToString();


Public Function ReadPdfFile(ByVal fileName As StringAs String

            Dim currentText As String =  "" 
            Dim PDFText  StringBuilder =  New () 
            Dim pdfReader  PdfReader =  New (fileName) 
            Dim page As Integer
            For  page = 1 To  pdfReader.NumberOfPages Step  page + 1
                Dim strategy  ITextExtractionStrategy =  New () 
                currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy)
                currentText = Encoding.UTF8.GetString(Encoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.UTF8.GetBytes(currentText)))
                PDFText = PDFText.Append(currentText)
            Return PDFText.ToString()

End Function

The ReadPdfFile() method will read more than one pages in PDF document.
Method for Converting Byte[] to string and Vice Versa

//Method to Convert Byte[] to string
private static string ConvertByteArrayToString(Byte[] ByteOutput)
            string StringOutput = System.Text.Encoding.UTF8.GetString(ByteOutput);
            return StringOutput;

//Method to Convert String to Byte[]
public static byte[] ConvertStringToByte(string Input)
            return System.Text.Encoding.UTF8.GetBytes(Input);


'Method to Convert Byte[] to string
Private Shared Function ConvertByteArrayToString(ByVal ByteOutput() As ByteAs String
            Dim StringOutput As String =  System.Text.Encoding.UTF8.GetString(ByteOutput) 
            Return StringOutput
End Function
'Method to Convert String to Byte[]
Public static Byte() ConvertStringToByte(String Input)

            Return System.Text.Encoding.UTF8.GetBytes(Input)

I hope this page will helps to read PDF Document text and convert to Stream and vise versa in ASP.Net application. Thanks.

