## Feeding data to TensorFlow
TensorFlow is designed to work efficiently with large amount of data. So it's important not to starve your TensorFlow model in order to maximize its performance. There are various ways that you can feed your data to TensorFlow.
### Constants
The simplest approach is to embed the data in your graph as a constant:
```python
import tensorflow as tf
import numpy as np
actual_data = np.random.normal(size=[100])
data = tf.constant(actual_data)
```
This approach can be very efficient, but it's not very flexible. One problem with this approach is that, in order to use your model with another dataset you have to rewrite the graph. Also, you have to load all of your data at once and keep it in memory which would only work with small datasets.
### Placeholders
Using placeholders solves both of these problems:
```python
import tensorflow as tf
import numpy as np
data = tf.placeholder(tf.float32)
prediction = tf.square(data) + 1
actual_data = np.random.normal(size=[100])
tf.Session().run(prediction, feed_dict={data: actual_data})
```
Placeholder operator returns a tensor whose value is fetched through the feed_dict argument in Session.run function. Note that running Session.run without feeding the value of data in this case will result in an error.
### Python ops
Another approach to feed the data to TensorFlow is by using Python ops:
```python
def py_input_fn():
actual_data = np.random.normal(size=[100])
return actual_data
data = tf.py_func(py_input_fn, [], (tf.float32))
```
Python ops allow you to convert a regular Python function to a TensorFlow operation.
### Dataset API
The recommended way of reading the data in TensorFlow however is through the dataset API:
```python
actual_data = np.random.normal(size=[100])
dataset = tf.data.Dataset.from_tensor_slices(actual_data)
data = dataset.make_one_shot_iterator().get_next()
```
If you need to read your data from file, it may be more efficient to write it in TFrecord format and use TFRecordDataset to read it:
```python
dataset = tf.data.TFRecordDataset(path_to_data)
```
See the [official docs](https://www.tensorflow.org/api_guides/python/reading_data#Reading_from_files) for an example of how to write your dataset in TFrecord format.
Dataset API allows you to make efficient data processing pipelines easily. For example this is how we process our data in the accompanied framework (See
[trainer.py](https://github.com/vahidk/TensorflowFramework/blob/master/trainer.py)):
```python
dataset = ...
dataset = dataset.cache()
if mode == tf.estimator.ModeKeys.TRAIN:
dataset = dataset.repeat()
dataset = dataset.shuffle(batch_size * 5)
dataset = dataset.map(parse, num_threads=8)
dataset = dataset.batch(batch_size)
```
After reading the data, we use Dataset.cache method to cache it into memory for improved efficiency. During the training mode, we repeat the dataset indefinitely. This allows us to process the whole dataset many times. We also shuffle the dataset to get batches with different sample distributions. Next, we use the Dataset.map function to perform preprocessing on raw records and convert the data to a usable format for the model. We then create batches of samples by calling Dataset.batch.