Hello everybody.
I have a problem with "Flat file"-connection, which I cannot understand at the present. Here is the issue: I've got an ASCII-file containing 233898 lines. I try to read this file in two different packages using two different connections. In the first connection I used default data type - DT_STR of length 50, in the second one I used "Propose types..." feature (with 2000 samples) to detect types which are better matching the reality. And, when I try to load my data, the first connection reads exactly 233898 lines from the file, the second one 203898. Somehow it skips 30000 lines unloaded.
I tried to observe the error output for the second connection - everything goes smoothly and problemless. But somehow those 30000 lines are missed.
Has anybody experienced such a situation? Is the issue known?
Thanks in advance,
A.G.
Is it possible that the missing rows contain data that does not match the data types detected by the "Propose Types" feature? If the 2000 sample rows were not representaive of the whole file, this could cause a problem.
With this said, I don't know if this would cause the behavior you're describing here, with the package running successfully end-to-end and no rows being directed to the error output, but it is the first thing I'd look at.
|||I think, this could happen. But in this case I would expect that my "Flat file"-source component should report an error, and not just skip lines which seem "not matching" his expectations.
At the present I try to identify the source of this issue by tuning data types manually. I hope, I will be able to report more later... But, in any way, any further comments are welcome!
A.
|||
Andrey Grigorev wrote:
I think, this could happen. But in this case I would expect that my "Flat file"-source component should report an error, and not just skip lines which seem "not matching" his expectations.
Yeah, I'd think the same thing too.
|||This can happen if you have an inconsistent number of column delimiters in your rows. Is the connection manager using the same column delimiters in both cases?|||Could you tell us how many columns there are in both connections and what data types/lengths got assigned by "Suggest Types" tool?
Thanks.
|||Hello.
Sorry for being silent last days - there were holidays in Germany. Today is a first working day...
Coming back to the flat file: I haven't check the data row by row, but I would say that the data is clean. It was exported from the operative system (database) of the customers and is used in other processes. Besides, if the data would have inconsistent number of delimeters, I would not be able to read all rows with "untuned" connector.
Thanks for your attention,
Andrey
|||What do you mean by "How many columns"? Number of columns in the ASCII-file / table, or number of columns read from the file? And if I post it here, how could it be helpfull? Perhaps, if you give me just an idea, I will be able to find the reason by myself.
Regards,
Andrey.
|||Well, I do not have a good idea, that is why I am trying to collect more information. Maybe, if your flat file connection managers have a lot of columns it could make it harder to see differencies between them that could cause this behavior. If you have one less column that might cause some rows to be eaten, etc.
The more information we can have about column metadata, the beter idea we can have how data is actually parsed and where a mismatch or a bug might be.
Thanks.
|||Hello.Sorry for being silent last week. I was busy with another project were deadline was closer...
So, coming back to the "FlatFile"-story. I tried to repeat the behaviour last week, but somehow I failed. I will go on this week and report on the results here. Additionally we are going to perform some extra tests with this connector - it (the connector) is needed for the one of our projects.
Regards,
Andrey.
No comments:
Post a Comment