Vision: a DirectShow machine vision engine

For the last few weeks, I’ve been working sporadically on a DirectShow reimplementation of the simple machine vision engine I wrote for my Robotics class. The original version used the old Video For Windows API, which is relatively simple, but unfortunately does not seem to work with a lot of newer cameras, especially on more recent versions of Windows. Anyway, it’s still a bit of a work in progress, but I at least have something working now.

The full code is below (a build script is also included at the bottom). However, building it for the first time is a bit of a pain because you need to compile the DirectShow samples in the Windows SDK first in order to have access to some of the dependencies (“strmiid.lib” and “strmbase.lib”, I think) – don’t ask me why they make this so complicated!

This is a link to download the executable version:

When you run vision.exe, it should open a window showing live video from the first video capture device. Here’s what it looked like for me:

vision_screenshot

To demonstrate some simple image processing, I’ve created 6 different video effect modes:

  • Press ‘n’ for normal video.
  • Press ‘i’ for inverted video.
  • Press ‘p’ for posterized video.
  • Press ‘r’ to display red component only.
  • Press ‘g’ to display green component only.
  • Press ‘b’ to display blue component only.

For example, here’s what it looks like when you press ‘i’:
vision_invert_screenshot

Here’s the C++ source code. A build script is also included below.

//
// vision.cpp - A stripped down DirectShow machine vision engine
// Written by Ted Burke - last modified 10-12-2012
//

// DirectShow header files
#include <windows.h>
#include <dshow.h>
#include <streams.h>
#include <initguid.h>
#include <d3d9.h>
#include <Vmr9.h>

// mode flag
int mode = 0;

//
// This is the function that actually processes the pixel
// data of each new frame. p1 is the original frame and
// p2 is the modified version. w and h are the width and
// height of the frame. The BYTE data type is just an
// unsigned 8-bit integer. Each pixel is 3 bytes, one
// byte for each colour component (24-bit RGB). The byte
// order actually seems to be blue, green, red.
//
void frame(BYTE *p1, BYTE *p2, int w, int h)
{
	int y, x, n;
	for (y=0 ; y<h ; ++y)
	{
		for (x=0 ; x<w ; ++x)
		{
			// Mode 1: Invert each pixel
			if (mode == 1) for (n=0;n<3;++n) p2[y*w*3+x*3+n] = 255 - p1[y*w*3+x*3+n];
			// Mode 2: Posterize each pixel (threshold each colour component)
			else if (mode == 2) for (n=0;n<3;++n) p2[y*w*3+x*3+n] = (p1[y*w*3+x*3+n] > 127) ? 255 : 0;
			// Mode 3: Red component only
			else if (mode == 3) {p2[y*w*3+x*3+0]=0; p2[y*w*3+x*3+1]=0 ; p2[y*w*3+x*3+2]=p1[y*w*3+x*3+2];}
			// Mode 4: Green component only
			else if (mode == 4) {p2[y*w*3+x*3+0]=0; p2[y*w*3+x*3+1]=p1[y*w*3+x*3+1] ; p2[y*w*3+x*3+2]=0;}
			// Mode 5: Blue component only
			else if (mode == 5) {p2[y*w*3+x*3+0]=p1[y*w*3+x*3+0]; p2[y*w*3+x*3+1]=0 ; p2[y*w*3+x*3+2]=0;}
			// Mode 0: Unmodified
			else for (n=0;n<3;++n) p2[y*w*3 + x*3 + n] = p1[y*w*3 + x*3 + n]; // unmodified
		}
	}	
}

// I generated the following GUID for this filter using the
// online GUID generator at http://www.guidgen.com/
// {d6ece2e3-72aa-4157-b489-52c3fd693ce9}
DEFINE_GUID(CLSID_FrameTransformFilter, 
0xd6ece2e3, 0x72aa, 0x4157, 0xb4, 0x89, 0x52, 0xc3, 0xfd, 0x69, 0x3c, 0xe9);

// DirectShow objects
HRESULT hr;
ICreateDevEnum *pDevEnum = NULL;
IEnumMoniker *pEnum = NULL;
IMoniker *pMoniker = NULL;
IPropertyBag *pPropBag = NULL;
IGraphBuilder *pGraph = NULL;
ICaptureGraphBuilder2 *pBuilder = NULL;
IBaseFilter *pCap = NULL;
IBaseFilter *pVMR = NULL;
IVMRFilterConfig9 *pVMRFilterConfig = NULL;
IVMRWindowlessControl9 *pVMRWindowlessControl = NULL;
IMediaControl *pMediaControl = NULL;
IAMStreamConfig *pStreamConfig = NULL;

// My frame transforming filter class
class FrameTransformFilter : public CTransformFilter
{
public:
	// Constructor
	FrameTransformFilter(int w, int h);
	
	// Methods required for filters derived from CTransformFilter
	HRESULT CheckInputType(const CMediaType *mtIn);
	HRESULT GetMediaType(int iPosition, CMediaType *pMediaType);
	HRESULT CheckTransform(const CMediaType *mtIn, const CMediaType *mtOut);
	HRESULT DecideBufferSize(IMemAllocator *pAlloc, ALLOCATOR_PROPERTIES *pProp);
	HRESULT Transform(IMediaSample *pSource, IMediaSample *pDest);
	
private:
	int w, h; // video frame width and height in pixels
};

// Frame transform filter objects
IBaseFilter *pTransform = NULL;
FrameTransformFilter *pFrameTransformFilter = NULL;

FrameTransformFilter::FrameTransformFilter(int width, int height)
  : CTransformFilter(NAME("My Frame Transform Filter"), 0, CLSID_FrameTransformFilter)
{
	// Initialize any private variables here
	w = width;
	h = height;
}

//
// This function is used during DirectShow graph building
// to limit the type of input connection that the filter
// will accept.
// Here, the connection must be 24-bit RGB at the dimensions
// currently stored in the filter's width and height member
// variables (as passed as arguments to the constructor).
//
HRESULT FrameTransformFilter::CheckInputType(const CMediaType *mtIn)
{
	VIDEOINFOHEADER *pVih = 
		reinterpret_cast<VIDEOINFOHEADER*>(mtIn->pbFormat);
	
	if ((mtIn->majortype != MEDIATYPE_Video) ||
		(mtIn->subtype != MEDIASUBTYPE_RGB24) ||
		(mtIn->formattype != FORMAT_VideoInfo) || 
		(mtIn->cbFormat < sizeof(VIDEOINFOHEADER)) ||
		(pVih->bmiHeader.biPlanes != 1) ||
		(pVih->bmiHeader.biWidth != w) ||
		(pVih->bmiHeader.biHeight != h) ||
		(pVih->bmiHeader.biBitCount != 24) ||
		(pVih->bmiHeader.biCompression != BI_RGB))
	{
		return VFW_E_TYPE_NOT_ACCEPTED;
	}
	
	return S_OK;
}

//
// This function is called to find out what this filter's
// preferred output format is. Here, the output type is
// specified at 24-bit RGB at the dimensions stored in
// the width and height member variables (passed as
// arguments to the constructor).
//
HRESULT FrameTransformFilter::GetMediaType(int iPosition, CMediaType *pMediaType)
{
	HRESULT hr;

	ASSERT(m_pInput->IsConnected());

	if (iPosition < 0) return E_INVALIDARG;
	if (iPosition > 0) return VFW_S_NO_MORE_ITEMS;

	if (FAILED(hr = m_pInput->ConnectionMediaType(pMediaType))) return hr;

	ASSERT(pMediaType->formattype == FORMAT_VideoInfo);
	VIDEOINFOHEADER *pVih =
		reinterpret_cast<VIDEOINFOHEADER*>(pMediaType->pbFormat);
	pVih->bmiHeader.biCompression = BI_RGB;
	pVih->bmiHeader.biSizeImage = DIBSIZE(pVih->bmiHeader);
	pVih->bmiHeader.biPlanes = 1;
	pVih->bmiHeader.biBitCount = 24;
	pVih->bmiHeader.biCompression = BI_RGB;
	pVih->bmiHeader.biWidth = w;
	pVih->bmiHeader.biHeight = h;

	return S_OK;
}

//
// This function is used to verify that the proposed
// connections into and out of the filter are acceptable
// before the capture graph is run.
//
HRESULT FrameTransformFilter::CheckTransform(
	const CMediaType *mtIn, const CMediaType *mtOut)
{
	// Check the major type.
	if ((mtOut->majortype != MEDIATYPE_Video) ||
		(mtOut->formattype != FORMAT_VideoInfo) || 
		(mtOut->cbFormat < sizeof(VIDEOINFOHEADER)))
	{
		return VFW_E_TYPE_NOT_ACCEPTED;
	}
	
	// Compare the bitmap information against the input type.
	ASSERT(mtIn->formattype == FORMAT_VideoInfo);
	BITMAPINFOHEADER *pBmiOut = HEADER(mtOut->pbFormat);
	BITMAPINFOHEADER *pBmiIn = HEADER(mtIn->pbFormat);
	if ((pBmiOut->biPlanes != 1) ||
		(pBmiOut->biBitCount != 24) ||
		(pBmiOut->biCompression != BI_RGB) ||
		(pBmiOut->biWidth != pBmiIn->biWidth) ||
		(pBmiOut->biHeight != pBmiIn->biHeight))
	{
		return VFW_E_TYPE_NOT_ACCEPTED;
	}
	
	// Compare source and target rectangles.
	RECT rcImg;
	SetRect(&rcImg, 0, 0, pBmiIn->biWidth, pBmiIn->biHeight);
	RECT *prcSrc = &((VIDEOINFOHEADER*)(mtIn->pbFormat))->rcSource;
	RECT *prcTarget = &((VIDEOINFOHEADER*)(mtOut->pbFormat))->rcTarget;
	if ((!IsRectEmpty(prcSrc) && !EqualRect(prcSrc, &rcImg)) ||
		(!IsRectEmpty(prcTarget) && !EqualRect(prcTarget, &rcImg)))
	{
		return VFW_E_INVALIDMEDIATYPE;
	}
	
	// Everything is good.
	return S_OK;
}

//
// Can't remember exactly what this function does.
// This is probably modified very little (if at all)
// from the original one in the MSDN tutorial.
//
HRESULT FrameTransformFilter::DecideBufferSize(
	IMemAllocator *pAlloc, ALLOCATOR_PROPERTIES *pProp)
{
	AM_MEDIA_TYPE mt;
	HRESULT hr = m_pOutput->ConnectionMediaType(&mt);
	if (FAILED(hr))
	{
		return hr;
	}
	
	ASSERT(mt.formattype == FORMAT_VideoInfo);
	BITMAPINFOHEADER *pbmi = HEADER(mt.pbFormat);
	pProp->cbBuffer = DIBSIZE(*pbmi) * 2; 
	if (pProp->cbAlign == 0) pProp->cbAlign = 1;
	if (pProp->cBuffers == 0) pProp->cBuffers = 1;
	
	// Release the format block.
	FreeMediaType(mt);
	
	// Set allocator properties.
	ALLOCATOR_PROPERTIES Actual;
	hr = pAlloc->SetProperties(pProp, &Actual);
	if (FAILED(hr)) return hr;
	
	// Even when it succeeds, check the actual result.
	if (pProp->cbBuffer > Actual.cbBuffer) return E_FAIL;
	
	return S_OK;
}

//
// This function is called to process the image data
// each time a new frame is received
//
HRESULT FrameTransformFilter::Transform(
	IMediaSample *pSource, IMediaSample *pDest)
{
	HRESULT hr;
	BYTE *pBufferIn, *pBufferOut;
	
	// Get pointers to the underlying buffers.
	if (FAILED(hr = pSource->GetPointer(&pBufferIn))) return hr;
	if (FAILED(hr = pDest->GetPointer(&pBufferOut))) return hr;

	// Call frame processing function
	frame(pBufferIn, pBufferOut, w, h);
	
	pDest->SetActualDataLength(pSource->GetActualDataLength());
	pDest->SetSyncPoint(TRUE);
		
	return S_OK;
}

LRESULT CALLBACK WndProc(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam)
{
	switch(msg)
	{
	case WM_KEYDOWN:
		// Something here responds to key press
		if (wParam == 'N') mode = 0;
		if (wParam == 'I') mode = 1;
		if (wParam == 'P') mode = 2;
		if (wParam == 'R') mode = 3;
		if (wParam == 'G') mode = 4;
		if (wParam == 'B') mode = 5;
		break;
	case WM_CLOSE:
		DestroyWindow(hWnd);
		break;
	case WM_DESTROY:
		PostQuitMessage(0);
		break;
	default:
		return DefWindowProc(hWnd, msg, wParam, lParam);
	}
	return 0;
}

int WINAPI WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance,
					LPSTR lpCmdLine, int nCmdShow)
{
	WNDCLASSEX wc;
	HWND hWnd;
	MSG msg;

	// Default capture size
	int w = 640; int h = 480;
	
	// Register window class
	wc.cbSize        = sizeof(WNDCLASSEX);
	wc.style         = 0;
	wc.lpfnWndProc   = WndProc;
	wc.cbClsExtra    = 0;
	wc.cbWndExtra    = 0;
	wc.hInstance     = hInstance;
	wc.hIcon         = LoadIcon(NULL, IDI_APPLICATION);
	wc.hCursor       = LoadCursor(NULL, IDC_ARROW);
	wc.hbrBackground = (HBRUSH)(COLOR_WINDOW+1);
	wc.lpszMenuName  = NULL;
	wc.lpszClassName = "myWindowClass";
	wc.hIconSm       = LoadIcon(NULL, IDI_APPLICATION);
	RegisterClassEx(&wc);

	// Create window
	hWnd = CreateWindowEx(
		WS_EX_CLIENTEDGE,
		"myWindowClass",
		"Machine Vision",
		WS_OVERLAPPEDWINDOW,
		CW_USEDEFAULT, CW_USEDEFAULT,
		w, h,
		NULL, NULL,
		hInstance, NULL);
	
	// Intialise COM
	fprintf(stderr, "Initialising COM...");
	hr = CoInitializeEx(NULL, COINIT_MULTITHREADED);
	
	// Create the filter graph and capture graph builder
	if (hr == S_OK) hr = CoCreateInstance(CLSID_FilterGraph, NULL,
			CLSCTX_INPROC_SERVER, IID_IGraphBuilder, (void**)&pGraph);
	if (hr == S_OK) hr = CoCreateInstance(CLSID_CaptureGraphBuilder2, NULL, 
			CLSCTX_INPROC_SERVER, IID_ICaptureGraphBuilder2, (void **)&pBuilder);
	if (hr == S_OK) hr = pBuilder->SetFiltergraph(pGraph);
	
	// Get the first video capture device and add it to the graph
	if (hr == S_OK) hr = CoCreateInstance(CLSID_SystemDeviceEnum, NULL,
			CLSCTX_INPROC_SERVER, IID_PPV_ARGS(&pDevEnum));
	if (hr == S_OK) hr = pDevEnum->CreateClassEnumerator(
			CLSID_VideoInputDeviceCategory, &pEnum, 0);
	if (hr == S_OK) hr = pEnum->Next(1, &pMoniker, NULL);
	if (hr == S_OK) hr = pMoniker->BindToObject(0, 0, IID_IBaseFilter, (void**)&pCap);
	if (hr == S_OK) hr = pGraph->AddFilter(pCap, L"Capture Filter");
	
	// Create frame transform filter and add it to the graph (NB This
	// object will be automatically deleted when pTransform is released)
	if (hr == S_OK) pFrameTransformFilter = new FrameTransformFilter(w, h);
	if (hr == S_OK) hr = pFrameTransformFilter->QueryInterface(
			IID_IBaseFilter, reinterpret_cast<void**>(&pTransform));
	if (hr == S_OK) hr = pGraph->AddFilter(pTransform, L"FrameTransform");
	
	// Create and configure VMR9 (the video mixing renderer)
	if (hr == S_OK) hr = CoCreateInstance(CLSID_VideoMixingRenderer9,
			NULL, CLSCTX_INPROC_SERVER, IID_IBaseFilter, (void**)&pVMR);			
	if (hr == S_OK) hr = pGraph->AddFilter(pVMR, L"Video Renderer");
	if (hr == S_OK) hr = pVMR->QueryInterface(IID_IVMRFilterConfig9, (void**)&pVMRFilterConfig);
	if (hr == S_OK) hr = pVMRFilterConfig->SetRenderingMode(VMRMode_Windowless);
	if (hr == S_OK) hr = pVMR->QueryInterface(IID_IVMRWindowlessControl9,
			(void**)&pVMRWindowlessControl);
	if (hr == S_OK) hr = pVMRWindowlessControl->SetVideoClippingWindow(hWnd);
	
	// Set the source and destination rectangles
	RECT rcSrc, rcDest;
	GetClientRect(hWnd, &rcDest); 
	SetRect(&rcDest, 0, 0, rcDest.right, rcDest.bottom);	
	if (hr == S_OK) hr = pVMRWindowlessControl->SetVideoPosition(NULL, &rcDest);
	
	// Get stream configuration interface to capture graph builder
	// then use it to set the video format
	if (hr == S_OK) hr = pBuilder->FindInterface(
			&PIN_CATEGORY_CAPTURE, &MEDIATYPE_Video,
			pCap, IID_IAMStreamConfig, (void**)&pStreamConfig);
	AM_MEDIA_TYPE *pmt = 0;
	if (hr == S_OK) hr = pStreamConfig->GetFormat(&pmt);
	if (hr == S_OK && pmt->formattype == FORMAT_VideoInfo)
	{
		VIDEOINFOHEADER *pvi = (VIDEOINFOHEADER *)pmt->pbFormat;
		pvi->bmiHeader.biWidth = w;
		pvi->bmiHeader.biHeight = h;
		hr = pStreamConfig->SetFormat(pmt);
		DeleteMediaType(pmt);
	}
	
	// Render video stream
	if (hr == S_OK) hr = pBuilder->RenderStream(
			&PIN_CATEGORY_CAPTURE, &MEDIATYPE_Video, pCap, pTransform, pVMR);
	
	// Start capture graph
	if (hr == S_OK) hr = pGraph->QueryInterface(
			IID_IMediaControl, (void**)&pMediaControl);
	if (hr == S_OK) while(1)
	{
		hr = pMediaControl->Run();			
		if (hr == S_FALSE) continue; // not ready yet, keep checking
		break;
	}

	// If setup was successful, show window and enter message Loop
	if (hr == S_OK)
	{
		ShowWindow(hWnd, nCmdShow);
		while(GetMessage(&msg, NULL, 0, 0) > 0)
		{
			TranslateMessage(&msg);
			DispatchMessage(&msg);
		}
	}
	else
	{
		// Alert user if an error occurred
		MessageBox(NULL, "An error occurred", "Error", NULL);
	}
	
	// Clean up DirectShow / COM stuff
	if (pMediaControl != NULL) pMediaControl->Stop();
	if (pMediaControl != NULL) pMediaControl->Release();
	if (pStreamConfig != NULL) pStreamConfig->Release();
	if (pVMRWindowlessControl != NULL) pVMRWindowlessControl->Release();
	if (pVMRFilterConfig != NULL) pVMRFilterConfig->Release();
	if (pVMR != NULL) pVMR->Release();
	if (pTransform != NULL) pTransform->Release();
	// NB pFrameTransformFilter is deleted automatically
	// when pTransform is released
	if (pCap != NULL) pCap->Release();
	if (pBuilder != NULL) pBuilder->Release();
	if (pGraph != NULL) pGraph->Release();
	if (pPropBag != NULL) pPropBag->Release();
	if (pMoniker != NULL) pMoniker->Release();
	if (pEnum != NULL) pEnum->Release();
	if (pDevEnum != NULL) pDevEnum->Release();
	CoUninitialize();
	
	return msg.wParam;
}

Here’s a simple build script for Microsoft’s cl.exe compiler (I can’t get DirectShow programs to compile with gcc). Just copy this into a text file and save it as “build.bat” in the same folder as “vision.cpp”. Then, to build the program, open a Visual Studio command window and type “build”.

cl vision.cpp /I"C:\Program Files\Microsoft SDKs\Windows\v7.1\Samples\multimedia\directshow\baseclasses" /MD -link /LIBPATH:"C:\Program Files\Microsoft SDKs\Windows\v7.1\Samples\multimedia\directshow\baseclasses\Release" user32.lib ole32.lib strmiids.lib oleaut32.lib strmbase.lib winmm.lib

You may need to adjust the paths in that command if your copy of the Windows SDK is installed in a different location, or if you have a different version installed (mine is version 7.1).

This entry was posted in Uncategorized and tagged , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s