Serial Flash Memory Programmer Schematic
Designing with discrete flash is 1/10th the cost, uses a much smaller form factor, and requires significantly less specialized hardware than using SD flash cards. This Instructable will show you how to add 1MB of discrete external flash memory to your microcontroller project with what I believe to be the least amount of effort possible. This is also a follow-on to my other two data-logging Instructables (an and a ) that explains how to download the data from the logger flash memory using age-old TTY command line applications found in Linux. Motivation Whenever I'm building an Atmel ATMega or Arduino project and I need to record data, I almost always reach for a single rather than an SD flash subsystem. Many reasons exist to choose a discrete flash chip over an SD subsystem, and vice versa, and you'll need to consider these tradeoffs for your design.
The list below contains a few tradeoffs I think about when I need to decide if I want to use a single 8-pin DIP chip or a full-on SD solution: Hardware Complexity (Choose: Discrete) One way to add SD flash to an Arduino system is to use a shield, such as (three 'e's) I bought at my local Radio Shack for $15. While shields provide convenience for prototyping, the final production assembly might not have the budget or the space to include SD hardware. An 8-pin DIP package of a discrete flash chip is much easier to drop on a protoboard than an SD shield, assuming your development board even supports a shield.
Software Complexity (Choose: Discrete) The SD flash subsystem commonly relies on. While the devices are an SPI interface, it makes sense to use FAT since any PC/MAC can then read this card. These libraries are large and can take up precious EEPROM space on smaller embedded controllers.
Compatibility and integration into your build environment may require significant debug. The software required to drive a discrete flash chip with an SPI interface is trivial and very small, as you will soon see.
Maybe this says more about me than the SDFat libraries, but I find them cumbersome to work with. Capacity & Portability (Choose: SD) SD flash wins big here, simply pop in a larger capacity SD card into the existing design with no modifications.
Discrete SPI flash has lower density limits in the 8-pin DIP format. The SDFat library means any PC/MAC can read the files on the card. Cost (Choose: Discrete) SD cards range in price dramatically, and with an SD flash shield, can set you back $20-$30. WinBond 1MB chips cost about $2 from Mouser or Digikey. Power (Choose: Discrete) Energy requirements of flash depend on the manufacturer, production lot, device density, and process technology.
SD cards are typically higher leakage power due to the higher densities, and higher dynamic power due to the higher access speeds. The WinBond chips I focus on in this Instructable require very little power, 6uW standby, 60mW page program, and 60mW chip erase. I wasn't able to find power data on the high-end super-fast SD cards, but the write speed is about 100x that of the WinBond. Since dynamic power is proportional to frequency, I can't imagine power would be less. Speed (Choose: SD) I haven't had any need for very fast flash memory write performance, but SD flash comes in many different product SKUs based on speed (mostly due to the demands of digital photography and the use of raw image formats).
The WinBond SPI chips can't really compare: page program speed is 0.7ms for 256 bytes, which translates to 0.360MB/s, which is 100x slower than Team Corp.’s fastest Micro SD cards at 40MB/s. I suspect they have multiple devices or arrays writing in parallel to achieve those speeds. While this analysis most likely represents my own lazy biases, I find my brand of laziness to be rather prolific. That being said, any one of these vectors may be more important for your project, but my goal here is to call out the tradeoffs, and then illustrate the simplicity of this wonderful flash chip. (And I haven't even discussed using larger capacity parallel flash chips.). I'm going to explain this next part painfully fast.
Flash Programmer 2
My first job at Intel was in the flash memory group in 1993, and a lot has changed with the technology in the 20 years since then, but some concepts remain consistent. Flash memory is a type of nonvolatile storage memory based on MOSFET technology. Nonvolatile means the device retains its value when it isn't powered-up. MOSFET If you aren't familiar with, I'll try to explain it in one sentence: a slab of silicon with two terminals on either end doesn't conduct electricity if you place a potential difference between them, but if you stick another piece of metal on top of that slab and sandwich a dielectric between it, and then apply a voltage to that piece of metal it creates a field and current can flow between the two terminals.
The terminals are called the source and drain, and the metal is called the gate. That's a super simple explanation that bulldozes 50 years of quantum physics, but from a Michael Farady point-of-view, it is reasonably workable. FLASH TRANSISTOR operates by blasting a bunch of charge carriers onto the dielectric between the gate and the substrate. This is called programming, and is typically done with a much higher voltage.
It actually damages the material, and after 100k program cycles, the gate will fail. To remove the charge carriers rom the dielectric, and equally high voltage, but reverse potential, pulls the carriers off the gate. This is called erasing. A programmed flash bit has value 0 and an erased bit has value 1, an erased flash byte is 0xFF in hex. (Nowadays, flash memory can store multiple bits per cell using multiple voltage levels, but that gets really complicated.) FLASH ARCHITECTURE Typically, a flash memory contains a giant array of transistors that can be individually programmed, but only erased in groups (sectors, blocks, or the entire chip).
This is simply a side effect of how the erase circuitry works: per-bit erase would require too much metal density, and isn't all that useful (in practice, erasing in larger chunks works just fine). Since programming a single transistor is slow due to ramping up that high voltage and all of the control that goes along with that, flash is usually programmed in pages. Typically a flash device will have a small SRAM page buffer (256 bits) which the host will first rapidly fill with data, and then the host issues a page write command, and the flash chip writes all the page bytes out in a large batch job. This batch circuitry amortizes the startup write latency across a larger number of bits. Offering two or more page buffers allows the host to use a double-buffer technique to hide the write latency of the flash device. SPI The is a brilliant invention. It is a simple serial interface that uses a chip select, a clock, a data IN and a data OUT.
There are many kinds of SPI devices, as it is a very popular interface, and all SPI devices use a common library: once you know how to talk to one SPI device, you can talk to any SPI device. The advantage to SPI is it's software simplicity, the code basically shifts data in and out of the DI and DO pins respectively, on the rising edge of a clock. And the clock is controlled by the host, it doesn't require a fancy clock circuit: the phases can be as asymmetric as you want, as long as you adhere to the minimum cycle width requirements of the device. FLASH SPI Flash SPI memory simply combines the best of both worlds. Note that SD cards use SPI as well as this discrete chip.
The programming interface isn't very different, but the actual instructions and timings differ. The pinout shown above is taken from the. Pin 1: Chip Select (/CS, sometimes called /SS, for 'serial select') CS is the 'Chip Select' pin. You set the CS pin when you want to talk to that device, because you could have a dozen SPI devices all sharing the same bus, and you identify each one uniquely via their CS pin.
The slash in front of CS means 'active low': to talk to this device, pull this pin to logic level zero; to remove it from the shared bus, drive logic level one. Pin 2: Data Out (DO) Serial data is read from this pin.
It will connect to the MISO (Master In / Slave Out) wire of the bus. Typically you write a command to the SPI device in a pre-determined sequence. After that sequence completes, and depending on the instruction in the sequence, data is then read off the DO pin. Pin 3: Write Protect (/WP) This pin disables writing. Sometimes you'll see a jumper attached to this pin in order to provide very strict control over the program/erase mechanism: if set low, the device cannot be programmed or erased. I usually hardwire it to Vdd and allow my software to control write enable/disable through serial commands (we'll talk about this later). EDIT (2016-12-16) Thanks to user for catching a typo: I had the polarity mixed up.
Pin 4: Ground This is simply the ground pin. Pin 5: Data In (DI) This is the input serial pin. It will connect to the MOSI (Master Out / Slave In) wire of the bus. Commands and data are written to this pin by the host system. Pin 6: Clock (CLK) The clock pin determines how data bits are transmitted on the DI and DO pins. The DI/DO pins are sampled on the rise of the clock pin.
Pin 7: Hold (/HD) I've never used this pin, but it allows a host device to pause whatever transaction is in flight. You'll probably never have to use this pin so I leave it wired to VCC (active low).
Pin 8: VCC This is simply the source voltage. Now that I've explained flash, SPI, and a specific implementation of an SPI flash device, the next things you need to understand are communication timing diagrams. Timing diagrams explain the sequencing of the data across the pins to issue instructions to the device. Each SPI device responds to its own set of instructions (e.g., a flash device will have a read or erase instruction) and the timing diagram is the link between the conceptual behavior of the instruction and the actual hardware protocol to execute that instruction. In the diagram for this section I copied the chip erase timing diagram from the datasheet because it is the easiest to understand.
The bottom axis is time, the vertical axes represent four SPI pins and the sequence data should appear on them over time to execute an instruction. Note: 'High impedance' means you can ignore that signal (it is driven to not 0 or 1, but extremely high resistance, so it is effectively an open circuit). Cases when two lines appear (like DI) that simple represents that some kind of transitions are happening but are unknown; a single line means a specific high or low value is present. Let's look at the diagram from left to right and top to bottom. In order to talk to any SPI device, it's chip-select must be brought high and then driven low (remember /CS means active low). When /CS is brought low, note that the clock in the diagram is very explicitly drawn to show eight phases. This means you must pulse the clock eight times, once per bit.
At the time the clock is strobing, data in goes from high to low to high. I think the DI diagram is erroneous, because if you draw a vertical line down the rising edge of each clock and calculate the binary values of DI at those points, you should get value 11000111, or 0xC7. This is the instruction that tells the chip to erase itself.
Once chip select is brought high, the internal circuitry will begin executing the 0xC7/Chip Erase function. This instruction takes about 12 seconds to complete. Keep in mind, you don't need to actually toggle the clock pin 8 times to send out 8-bits of a byte, the SPI library does this for you when you use the function SPI.transfer. You will still need to manually drive /CS with digitalWrite, but the SCK, MOSI and MISO is all handled by the SPI functions. You will notice in my source code a function called 'notbusy'.
This function continually issues a 'read control register #1' and checks bit 0 which indicates if the internal operation has completed yet, and the flash is not busy. The timing of this operation matches diagram 9.2.8 of the datasheet.
Note I am not referring to the electrical timing diagrams, which explain to the nanosecond the setup and hold times for the internal digital logic; the diagrams I'm referring to are the logical diagrams that ignore nanoseconds and describe the sequence of logical events. The actual electrical timing of the SPI interface is handled by the Arduino SPI library.
And to be honest, that code isn't very complex, and could be further simplified if you are designing to one specific device. The Arduino Uno's digital outputs transmit 0V and 5V as logic levels low and high, respectively. The WinBond flash chip only operates between 2.7V and 3.6V. Whenever logic circuits on different voltage planes need to communicate, we have to use a level-shifter. The easiest form of level-shifter is. There are many other types of level shifters in the world, some are faster, some use less power, the Zener clamp method is quick and easy. All diodes have a reverse breakdown voltage at which point they begin to conduct.
Zener diodes are specifically designed to breakdown at finely tuned voltages. In my case, I connected a 3.3V Zener diode in parallel with each of the chip's digital inputs (see the schematic).
(As for the other four pins, ground is 0V, and the Uno board has a 3.3V supply for VCC, so these pins don't need a diode, and I hardwired /WP and /HOLD to 3.3V Vcc.) UPDATE: I forgot to add the 330 Ohm resistors in series with the output of the Uno drivers. Normally, if you were connecting the digital output of the Uno to a digital input of another device, a simple wire would suffice (since you are connecting one digital logic signal to another, see the ATmega328 datasheet, section 13.1 'I/O Pin Equivalent Schematic'). But since the output path now branches through the Zener, you need a resistor to limit the maximum current driven by the logic output of the Uno/ATmega chip. Without the resistor, this path to ground may exceed the max output current of the device. Now, whenever the Uno drives a 5V logic-high into, say, the /CS pin, the Zener diode switches to breakdown mode, clamping the voltage to 3.3V, thus protecting the input logic of the flash chip. Using these clamps, I connected the Arduino Uno's digital output pin 10 (SS) to /CS, pin 11 (MOSI) to DI, pin 12 (MISO) to DO, and pin 13 (SCK) to CLK.
(Note that the pins of the Atmega328 are NOT the same pins as the Uno, e.g., the Atmega pin #19 is Uno pin #13.) The SPI software library assumes pin 10 = SS, etc. I wrote a sketch that allows me to communicate with the Uno via serial TTY communication via the Serial Monitor (or even a Unix prompt, as you well see).
This is a helpful method for debugging new hardware, as I can issue commands interactively. The 'serialEvent' function is a built-in callback, called whenever something happens on the default Serial object. I use this callback to construct a command string and set a boolean flag (the byte-by-byte construction of the string completes when the callback reads as semicolon ';' from the stream; I use this instead of a newline since there's no way to issue a newline from the serial monitor). When the callback constructs the string and sets the flag, the 'loop' function executes a decoder. The decoder determines which function to call based on the command string, and parses any additional parameters from the command string, and calls that function.
Each function is essentially a wrapper around a low-level implementation of a WinBond SPI functional timing diagram. I used a wrapper so that the low-level functions remain generic: I can use them again in other sketches with a simple cut-and-paste. Plus, the wrapper prints some feedback to the user, which is very useful for debug.
The screenshot above shows an interactive session with the Serial Monitor. I have issued four commands, 'getjedecid;', 'readpage 0;', 'writebyte 0 2 8;', and 'readpage 0;' You don't actually see the commands (the serial monitor doesn't have an echo, and I didn't print the exact command. I probably should have), but you do see the response. It should be most clear when I read/write/read page 0. The 'readpage;' command simply dumps the specified page (in decimal).
The 'writebyte;' function is a little weird, as the parameters specify a page number, an offset into that page, and then the byte. Since there is no native 32-bit register in the 16-bit Atmega, I didn't bother doing logical to physical translation, but you'll need to consider this translation at some point. Anyway, notice that the third byte of page zero is now '08h'. I could have also issued 'chiperase;' and then 'readpage 0;' to illustrate an erase cycle, but hopefully you get the picture. The low level functions start with ', and are named 'readpage' or 'writepage' or 'erasechip'. These functions explicitly sequence out the SPI commands found in the datasheet timing diagrams.
Each function ends with a call to 'notbusy' to prevent execution from proceeding before the chip has completed its internal operation. EDIT (11-MAR-2014): There was an issue with the readpage low-level function, I had forgotten to pull CS HIGH before pulling it LOW at the start of the function, like the other functions. This means if readpage is the first function you call, CS may not already be high, so without a valid /CS 1-0 transition readpage will not function properly, the first time it is called. The second time it would work fine because it leaves /CS as 1. Small but annoying bug. Attachments. The real reason for this Instructable is demonstrate how to download the entire flash memory to a single file.
To do this, I used a Unix function, 'tail -f' and a redirect. The Unix function 'tail' prints the last 10 lines of a text file. When given the parameter '-f', 'tail' will remain connected to the redirect until it catches a SIGINT (e.g., Ctrl-C).
There are three windows open in this screenshot: the Arduino IDE on the left, the Serial Monitor on the upper-right, and an OSX POSIX terminal in the lower-right. In OSX/POSIX land, the USB controller of the Uno shows up as a /dev/tty device, in this case '/dev/tty.usbmodem1411'. I connect 'tail -f' to this device and redirect the output to a file. I then issued a 'readpage 0;' command in the serial monitor, and the output is sent through 'tail' since it is connected to the output of the TTY, and then sent to the file. I then 'cat' the file to prove the serial stream was captured.
Now all I need to do to dump the ENTIRE flash chip is to type this in the terminal prompt:% tail -f /dev/tty.usbmodem1411 1MBofflash.txt And then type this in the Serial Monitor window: readallpages; Then type CTRL-C in the terminal window to stop the 'tail' process. Done and done! This is why Unix is so vastly superior to any other operating system, IMHO. I would not say that SD cards beat embedded flash for speed. At least, if you're using SPI to talk to it, you're not going to get good results at all. The SDIO protocol is the only one for which the advertised speeds of 40 or 100 MB/s is actually guaranteed. Over SPI, the controller behaves differently and those speeds are out the window.
In between sector writes, if the SD card's controller decides to flash a block to make room for the next sector, you will be stuck in a waiting mode for usually at least 10 ms, and I've seen it take 130 ms. If that's not a problem, then great-it does ok on average. But if you are logging data in real time and don't have the RAM to buffer it all, watch out.
I think you made your point backwards, because you say SD doesn't beat a discrete embedded device, but then you offer points that support SD! (e.g., SDIO vs. SPI and FFS impact).
Block cycling typically happens behind the scenes and the intent of the flash file system is to hide the impact. While it can happen during a read or write, it is purely a function of the flash file system policy and not the memory itself: it is all the same flash!
Also, while SPI maximum speed is only limited by the MCU (compare Silicon Labs, PIC, STMicro and TI, they all offer different SPI speeds). An SD card, using the SDIO protocol will access in the devices in parallel, not serial/SPI, which is why you will notice the write speed as a multiple of the flash device, and may be up to 64 bits wide depending on the number of parallel-accessible bits available on the die/package and chips in the card. This parallelism can be done with discrete flash chips, but once you start going down that path, SDIO solves a lot of problems.
Thanks for the feedback.
If you are simply looking for a way to program the Winbond SPI flash with 'pre-loaded' data that your microcontroller would read for use when it is running then what you will want to look into is a programmer that can do in-circuit programming of the SPI Flash chip. This also known as in-system-programming (ISP). One choice is the. This USB connected device can program in circuit if you design your board correctly. They even sell an adapter clip that can attach into the SOW-16 package without having to design in a separate programming header on your board.
DediProg has application information bulletins available to help with correct design for in circuit use. The main strategy for the design is to find a simple way to isolate the SPI interface drivers in your MCU system so that they do not interfere with the drivers in the SPI programming pod. The simplest way to do this is to put series resistors in the MCU driven lines between the MCU and the SPI Flash. The programmer would connect on the SPI flash side of the series resistors. Alternate methods could include adding a MUX or analog switches in the driven interface lines.
An even more clever scheme is to add a 'programming enable' input to the MCU that makes the software disconnect all the SPI I/Os from the SPI Flash chip (i.e. Make all those GPIOs as inputs). A second choice to also consider is. The Presto is able to do various types of SPI and I 2C devices including SPI Flash devices. I have one of these devices specifically for programming Atmel MCUs and various types of SPI Flash devices. It is a more cost effective solution than the above unit but not quite as flexible.
Their more expensive device called the Forte is able to do more things because it has more target interface pins. Sometimes it can be beneficial to be able to connect a programmmer to a target board without having to add a programming header. One nice solution for this is to place a small set of pads in a special footprint defined.
They manufacture and sell a series of quick connect programming cables that have pogo pins that engage the special footprint on the board. There are 6-pin, 10-pin and 14-pin versions of the cable available to suit a range of applications. Cost of the cables are very reasonable. I have never heard of any other tools talking SPI directly to such a chip, and I think it is impossible since 'all' chips require different calls for different operations.
The chip needs SPI calls for write, read, change sector, data size etc. Under 7.2 Instructions chapter in the datasheet you can see all the SPI commands you can send to it.
Hence, since all external flash memories does not have the same instruction set, you need to write a customized application for this one. EDIT: Being a follow up, I would really recommend one of Atmels own SPI flash memories, since most of them already has written open available code for them. Looking at from will provide you with code for some of Atmels AT45xxxx serial flash chips. I purchased a ' programmer from Embedded Computers for about $30 US. It was surprisingly easy to connect to the PC via USB and write files to the Winbond flash memory.
The methods and programmers in other answers are probably just as good, some more expensive or DIY, but this is a cheap and simple way that fits what I was seeking. Here's a picture of the setup: The FlashCAT programmer is at left, connected to USB. It's running the SPI programming firmware (as opposed to JTAG) and supplying power to the flash memory. The supplied power is selectable (3.3V or 5V) with a jumper. I have a SOIC to DIP socket on the breadboard to make it easy to program multiple chips. (You can see another flash memory IC sitting on the breadboard as well.) I haven't yet converted my audio file to the proper binary format, but I wrote a 211KB WAV file to memory just to test, pictured above.
I then read it back and saved it as a new file, renamed it to.wav, and it plays correctly on the PC. The next step will be to properly encode the file, and write the AVR software to read the data and send it through a DAC.
Disclaimer: I am not affiliated with Embedded Computers, I'm just a customer who picked something inexpensive and am sharing information about the experience with the product. Kind of late to the discussion, but for anyone reading it after a search. One thing I did not see mentioned, which is absolutely critical when programming SPI Flash chips is control of the Chip Select (CS) pin.
The Chip Select pin is used to punctuate commands to the SPI Flash. In particular, a transition from CS high to CS low must immediately precede the issuance of any Write operation op code (WREN, BE, SE, PP). If there is activity between the CS transition (i.e. After CS has gone low) and before the write op code is transmitted, the write op code will usually be ignored. Also, what's not commonly explained in SPI Flash datasheets, because it's an inherent part of the SPI protocol, which is also critical, is that for every byte one transmits on the SPI bus, one receives a byte in return. Also, one cannot receive any bytes, unless one transmits a byte.
Jun 18, 2007 - i can't find CD key for flight simulator X plz tell me!!!!!!!!!!!! Pro Member Chief. I need another product key for flight sim.(other than the given. Sep 1, 2010 - If you bought a CD, then the activation code will be on the back. If you lost the activation key you have to call microsoft or buy a new game. Fsx activation key.
Typically, the SPI Master that the user is commanding, has a Transmit Buffer, which sends bytes out on the MOSI line of the SPI bus and a Receive Buffer, which receives bytes in from the MISO line of the SPI bus. In order for any data to appear in the Receive buffer, some data must have been sent out the Transmit Buffer. Similarly, any time one sends data out of the Transmit buffer, data will appear in the Receive Buffer. If one is not careful about balancing Transmit writes and Receive reads, one will not know what to expect in the Receive buffer.
If the Receive buffer overflow, data is usually just spilled and lost. So, when one sends a read command, which is a one byte op code and three address bytes, one will first receive four bytes of 'garbage' in the SPI Master Receive buffer. These four bytes of garbage correspond to the op code and three address bytes. While those are being transmitted, the Flash does not yet know what to Read, so it just returns four words of garbage. After those four words of garbage are returned, in order to get anything else in the Receive Buffer, you must Transmit an amount of data equal to the amount that you want to Read.
After the op code and address, it doesn't matter what you transmit, it's just filler to push the Read DAta from the SPI Flash to the Receive Buffer. If you didn't keep careful track of those first four returned garbage words, you might think that one or more of them is part of your returned Read Data.
So, in order to know what you are actually getting from the receive buffer, it's important to know the size of your buffer, know how to tell whether it's empty or full (there's usually register status bit to report this) and keep track of how much stuff you've transmitted and how much you've received. Before starting any SPI Flash operation, it's a good idea to 'drain' the Receive FIFO.
This means check the status of the receive buffer and empty it (usually done by performing a 'read' of the Receive Buffer) if it is not already empty. Usually, emptying (reading) an already empty Receive Buffer does no harm.
The following information is available from the timing diagrams in datasheets of SPI Flashes, but sometimes folks overlook bits. All commands and data are issued to the SPI flash using the SPI bus.
The sequence to read a SPI Flash is: 1) Start with CS high. 2) Bring CS low. 3) Issue 'Read' op code to SPI Flash. 4) Issue three address bytes to SPI Flash. 5) 'Receive' four garbage words in Receive Buffer. 6) Transmit as many arbitrary bytes (don't cares) as you wish to receive. Number of transmitted bytes after address equals size of desired read.
7) Receive read data in the Receive Buffer. 8) When you've read the desired amount of data, set CS high to end the Read command. If you skip this step, any additional transmissions will be interpreted as request for more data from (a continuation of) this Read. Note that steps 6 and 7 must be interleaved and repeated depending on the size of the read and the size of your Receive and Transmit Buffers. If you Transmit a larger number of words at one go, than your Receive Buffer can hold, you'll spill some data. In order to preform a Page Program or Write command perform these steps.
Page Size (typically 256 bytes) and Sector Size (typically 64K) and associated boundaries are properties of the SPI Flash you are using. This information should be in the datasheet for the Flash. I will omit the details of balancing the Transmit and Receive buffers. 1) Start with CS high.
2) Change CS to low. 3) Transmit the Write Enable (WREN) op code. 4) Switch CS to high for at least one SPI Bus clock cycle.
This may be tens or hundreds of host clock cycles. All write operations do not start until CS goes high. The preceding two notes apply to all the following 'CS to high' steps.
5) Switch CS to low. 6) Gadfly loop: Transmit the 'Read from Status Register' (RDSR) op code and one more byte. Receive two bytes. First byte is garbage. Second byte is status. Check status byte.
If 'Write in Progress' (WIP) bit is set, repeat loop. (NOTE: May also check 'Write Enable Latch' bit is set (WEL) after WIP is clear.) 7) Switch CS to high. 8) Switch CS to low. 9) Transmit Sector Erase (SE) or Bulk Erase (BE) op code. If sending SE, then follow it with three byte address. 10) Switch CS to high. 11) Switch CS to low.
12) Gadfly loop: Spin on WIP in Status Register as above in step 6. WEL will be unset at end.
13) Switch CS to high. 14) Switch CS to low. 15) Transmit Write Enable op code (again). 16) Switch CS to high.
17) Switch CS to low. 18) Gadfly loop: Wait on WIP bit in Status Register to clear. (WEL will be set.) 19) Transmit Page Program (PP = Write) op code followed by three address bytes. 20) Transmit up to Page Size (typically 256 bytes) of data to write. (You may allow Receive data to simply spill over during this operation, unless your host hardware has a problem with that.) 21) Switch CS to high. 22) SWitch CS to low. 23) Gadfly loop: Spin on WIP in the Status Register.
24) Drain Receive FIFO so that it's ready for the next user. 25) Optional: Repeat steps 13 to 24 as needed to write additional pages or page segments.
Finally, if your write address is not on a page boundary (typically a multiple of 256 bytes) and you write enough data to cross the following page boundary, the data that should cross the boundary will be written to the beginning of the page in which your program address falls. So, if you attempt to write three bytes to address 0x0FE. The first two bytes will be written to 0x0fe and 0x0ff. The third byte will be written to address 0x000. If you transmit a number of data bytes larger than a page size, the earlies bytes will be discarded and only the final 256 (or page size) bytes will be used to program the page. As always, not responsible for consequences of any errors, typos, oversights, or derangement in the above, nor in how you put it to use.
Contrary to some of the statements here, while there are some quirky SPI PROMs out there, there are also some standard instructions used by a large variety of SPI PROMs, including the one you've chosen. As vicatcu already mentioned, there are good 'bit-bash' cables available that can directly program SPI. Signal-wise, SPI looks a lot like JTAG, so any bit-bash type of cable should be able to be used provide the interface is open source. The internal protocol of the flash is fairly simple.
We use the big brother of the part you're looking at to boot our FPGA boards (256M - 2G). The addressing has an extra byte to handle the storage volume, but otherwise the commands are basically identical. The type of PROM you're using has to be erased by sector, then programmed by page. Reading is significantly faster than writing (in the case of the ones we use, programming can take half an hour, but reading the whole PROM takes under a second at 108MHz). Now for the commands: There are way more commands available in these devices than are actually required to program them. You actually only need the following:. RDID (read ID) - just to verify the PROM and signalling before you do anything more complex.
WREN (write enable) - needed before every write. PP (0x02 - page program) - needed to program a page. SE (0x20 - sector erase) - returns bits in sector to '1'. RDSR (0x05 - read status register) - needed to monitor erase / write cycle.
FREAD (0x0B - fast read) - read PROM data and verify write. If you want more information look at answer notes on SPI programming for Xilinx FPGAs on their website (They implement a reduced subset of commands so their FPGAs can boot from these devices. I designed my own programmer to do this based on what I have available and wrote a programmer script in Python, but you can do the same using a cable. In your case, I would seriously consider doing everything indirectly through the MCU as Michael Karas suggests. You don't need to program the whole PROM from the MCU in one go - you can do it by sector.
You should be able to re-purpose the USBtiny to program a flash memory instead of a target MCU if you are comfortable changing it's programming. However, there may not be enough memory on that to make it versatile enough to program both the MCU and the flash. Somewhere I have a board from a project which has both an ATTINY and an SPI flash, and uses as an Arduino as a readily available 'programmer'. A slight modification of the ISP sketch is used to program the MCU with avrdude, then a custom utility sends a sequence which puts the sketch in a special mode and writes blocks of data to the SPI flash.